What are independent and dependent variables?
Bonus: What is a confounding variable?
The independent variable is the one we manipulate and is expected to be independent from all other variables in the experiment.
The dependent variable is measured and is
expected to vary due to the manipulation of
the independent variable.
A third variable is one that is related to
the dependent variable that can also
explain the observed effects. A
confound is an alternative explanation
where there is some unique
manipulation alongside the dependent
variable that is not accounted for.
What is reliability?
Reliability is how likely that numerous measurements will report similar observations to previous ones.
What is a bar graph?
A bar graph can provide a visual representation of the differences in data.
What is binning?
(Think about visualizing data)
Binning is a method to combine intervals into smaller increments. This is useful when your variable has a wide range
Describe the mean, median and mode.
(Think about measures of central tendency)
The mean is the value that is closest as possible to all scores. It treats all scores equally.
The median is the score that is midway through all the data. It splits the lower and upper half. Data
must be rank ordered from smallest to largest
The mode is the most common score
What is external validity?
External validity is how well the data represents the population of interest
What is internal validity?
Internal validity is how well the data measures the construct of interest
What is meant by truncating? (Think about bar graphs)
We can truncate the figure, where the Y-axis does not start at zero, which magnifies hard-to-see differences.
What is a frequency table and how does it help with histograms?
Bonus: What is a histogram?
Data can be put into a frequency table, which summarizes scores by listing their frequency. From this, it can be easy to visualize data with a histogram, which is a bar plot where the X-axis contains continuous data
Describe what is meant by central tendency and measurement of spread.
Central tendency provides a single value
that is representative of the overall data
Measurement of spread describes how scores
typically vary from the central tendency
What are the 4 types of data a variable can yield?
categorical: nominal and ordinal, measurement: interval and ratio
Explain the terms sufficiency and efficiency.
Efficiency is how much data do we need for a variable to be a good estimate
Sufficiency is how much data is used to create an estimate
Explain the terms bias and resistance.
Bias is whether the variable is likely to overestimate or underestimate the true value it is estimating
Resistance is how much influence do deviant scores like outliers have on the estimate
What is a relative frequency histogram and how is it calculated?
A relative frequency histogram shows the percent of each score of the total rather than the raw numbers. This can be easier to understand the quantity of scores relative to the entire dataset. Captured by dividing the frequency over N
What is a cumulative frequency histogram and what is it good for?
In a cumulative frequency histogram, each bar includes the sum of the previous values. This shows a total increment. It is good for data where not many changes occur at each interval, or you want to display an aggregate
What is range? Also, describe the interquartile range. What is it useful for? What kind of plot is used for it?
The range is simply the difference between the smallest and largest score
The interquartile range (IQR) is the range of the middle 50% of the data. This involves calculating
three sets of medians. Once again, the data needs to be rank ordered.
The IQR can be more useful than the range when there are many highly deviant scores.
Box and whisker plot.
Two part question:
1. What is variance? What parts does it consist of?
2. Ask Vaasvi for question
The variance is the average squared distance scores are from the mean. Variance is made of two parts, the
sum of squared differences (SS), and the denominator to average (N, or df)
Differences are squared, else the sums after subtracting from the mean will be zero
What is standard deviation? Why do we not just use variance? How do you calculate it?
Variance works well with the math but conceptualizing a squared value can be difficult. We
can convert this to a unit that is easier to work conceptualize. The standard deviation is the
average distance scores are from the mean
sq root of variance
What is the difference between population and sample? What is the difference between population and sample parameters?
POPULATION = The group of people for which a researcher wishes to make conclusions. SAMPLE = The group of people from whom data is actually collected.
pop = denoted
by Greek or upper-case Roman characters.
Sample
statistics formulae are denoted by lower-case Roman characters
Why do we use n-1 in the sample statistic formulae?
(Think about df)
This is referred to as degrees of freedom (df). This is a modification to the denominator to
account for the fact that sample variance (and sample SD) is not an unbiased estimator
of the population variance or population SD (underestimates variance in pop). Sample
mean is an unbiased estimator of the population, and so it does not need a correction
(You will cover this more in detail in the coming lectures)