PSYB07 FSG: Terminology and Descriptives

Terms

100

What are independent and dependent variables?
Bonus: What is a confounding variable?

The independent variable is the one we manipulate and is expected to be independent from all other variables in the experiment.
The dependent variable is measured and is expected to vary due to the manipulation of the independent variable.

A third variable is one that is related to the dependent variable that can also explain the observed effects. A confound is an alternative explanation where there is some unique manipulation alongside the dependent variable that is not accounted for.

100

What is reliability?

Reliability is how likely that numerous measurements will report similar observations to previous ones.

100

What is a bar graph?

A bar graph can provide a visual representation of the differences in data.

100

What is binning?

(Think about visualizing data)

Binning is a method to combine intervals into smaller increments. This is useful when your variable has a wide range

100

Describe the mean, median and mode.

(Think about measures of central tendency)

The mean is the value that is closest as possible to all scores. It treats all scores equally.

The median is the score that is midway through all the data. It splits the lower and upper half. Data must be rank ordered from smallest to largest

The mode is the most common score

200

What is external validity?

External validity is how well the data represents the population of interest

200

What is internal validity?

Internal validity is how well the data measures the construct of interest

200

What is meant by truncating? (Think about bar graphs)

We can truncate the figure, where the Y-axis does not start at zero, which magnifies hard-to-see differences.

200

What is a frequency table and how does it help with histograms?

Bonus: What is a histogram?

Data can be put into a frequency table, which summarizes scores by listing their frequency. From this, it can be easy to visualize data with a histogram, which is a bar plot where the X-axis contains continuous data

200

Describe what is meant by central tendency and measurement of spread.

Central tendency provides a single value that is representative of the overall data
Measurement of spread describes how scores typically vary from the central tendency

300

What are the 4 types of data a variable can yield?

categorical: nominal and ordinal, measurement: interval and ratio

300

Explain the terms sufficiency and efficiency.

Efficiency is how much data do we need for a variable to be a good estimate
Sufficiency is how much data is used to create an estimate

300

Explain the terms bias and resistance.

Bias is whether the variable is likely to overestimate or underestimate the true value it is estimating

Resistance is how much influence do deviant scores like outliers have on the estimate

300

What is a relative frequency histogram and how is it calculated?

A relative frequency histogram shows the percent of each score of the total rather than the raw numbers. This can be easier to understand the quantity of scores relative to the entire dataset. Captured by dividing the frequency over N

300

What is a cumulative frequency histogram and what is it good for?

In a cumulative frequency histogram, each bar includes the sum of the previous values. This shows a total increment. It is good for data where not many changes occur at each interval, or you want to display an aggregate

400

What is range? Also, describe the interquartile range. What is it useful for? What kind of plot is used for it?

The range is simply the difference between the smallest and largest score

The interquartile range (IQR) is the range of the middle 50% of the data. This involves calculating three sets of medians. Once again, the data needs to be rank ordered.

The IQR can be more useful than the range when there are many highly deviant scores.

Box and whisker plot.

400

Two part question:

1. What is variance? What parts does it consist of?
2. Ask Vaasvi for question

The variance is the average squared distance scores are from the mean. Variance is made of two parts, the sum of squared differences (SS), and the denominator to average (N, or df)

Differences are squared, else the sums after subtracting from the mean will be zero

400

What is standard deviation? Why do we not just use variance? How do you calculate it?

Variance works well with the math but conceptualizing a squared value can be difficult. We can convert this to a unit that is easier to work conceptualize. The standard deviation is the average distance scores are from the mean

sq root of variance

400

What is the difference between population and sample? What is the difference between population and sample parameters?

POPULATION = The group of people for which a researcher wishes to make conclusions. SAMPLE = The group of people from whom data is actually collected.

pop = denoted by Greek or upper-case Roman characters.
Sample statistics formulae are denoted by lower-case Roman characters

400

Why do we use n-1 in the sample statistic formulae?

(Think about df)

This is referred to as degrees of freedom (df). This is a modification to the denominator to account for the fact that sample variance (and sample SD) is not an unbiased estimator of the population variance or population SD (underestimates variance in pop). Sample mean is an unbiased estimator of the population, and so it does not need a correction

(You will cover this more in detail in the coming lectures)