Ch. 1-6 Vocabulary -AP Stats Jeopardy Template

Normal Distributions

Regressions

Research and Design

Probability

Everything

100

Tell us how do find descriptive statistics in the Calculator for Univariate Data.

Stat, Edit, Enter data in List 1

Stat, >, Calc, 1:1-Var Stats

Scroll through the list to find what you need

100

What is the difference between bivariate data and univariate data?

Univariate data. When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data.
Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data.

100

What is Stratified random sampling?

Stratified sampling refers to a type of sampling method . With stratified sampling, the researcher divides the population into separate groups, called strata. Then, a probability sample (often a simple random sample ) is drawn from each group.

100

Give an example of a mutually exclusive two way table

Draw it on the board.

Any table given with the probability 0 where the two events occur at the same time.

100

Define Bias and give at least one example.

Bias refers to the tendency of a measurement process to over- or under-estimate the value of a population parameter. In survey sampling, for example, bias would be the tendency of a sample statistic to systematically over- or under-estimate a population parameter.

200

A distribution curve where the mean is higher that the median is ______________.

Skewed Right

200

Describe how to Create a Regression equation in the calculator...

Stat, Edit, Enter Data

Stat,>Calc, 8:LinReg, StoreRegEq: Vars, y-vars, 1:Func, 1:y1, Enter on Calculate

Record equation and use table and graph to answer questions.

200

What Conditions must be met to determine Cause and effect?

It must be an Experiment with treatments and must have random assignment to those treatments

200

What is the rule for independence? Is this on the formula sheet?

Explain it context of an example.

P(A)= P(A|B)=P(A|B^c)

Not on Formula sheet

Explain...

200

Describe the pieces of a box-plot and a modified Box plot.

Draw an example to illustrate on the white board.

A boxplot splits the data set into quartiles. The body of the boxplot consists of a "box" (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3).

Within the box, a vertical line is drawn at the Q2, the median of the data set. Two horizontal lines, called whiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier.

300

Explain the difference between a Qualitative and Quantitative data set. Give examples.

Categorical. Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical variables.
Quantitative. Quantitative variables are numerical. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable.

300

What is r? And what is r²? How do I interpret both?

Correlation coefficients measure the strength of association between two variables. The most common correlation coefficient, called the Pearson product-moment correlation coefficient, measures the strength of the linear association between variables.

The coefficient of determination (denoted by R2) is a key output of regression analysis. It is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable.

300

What is a Matched Pairs Design?

A Matched Pairs Design is a special case of a randomized block design. It can be used when the experiment has only two treatment conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are randomly assigned to different treatments.

300

Describe the difference between Discrete and Continuous Random Variables.

Give an Example of both

(different from the notes)

If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable.

Some examples will clarify the difference between discrete and continuous variables.

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.
Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.

300

If the equation y-hat=4.8 +0.21x represents a linear regression of Happiness Score (1 to 10) based on Income in thousands of dollars.

Interpret the slope and the y- intercept

Slope 0.21- the predicted y-value of happiness is expected to increase 0.21 for every additional 1,000 dollars gained in income

Y-intercept- We predict if an adult in this study have an income of $0, they would have a happiness score of 4.8.

400

Define how to determine if a point is an Outlier

An outlier is an extreme value that differs greatly from other values in a set of values. As a "rule of thumb", an extreme value is considered to be an outlier if it is at least 1.5 interquartile ranges below the first quartile (Q1), or at least 1.5 interquartile ranges above the third quartile (Q3).

400

What is a residual? And what is a Residual Plot?

In regression analysis, the difference between the observed value of the dependent variable (y) and the predicted value (ŷ) is called the residual (e). Each data point has one residual.

Residual = Observed value - Predicted value
e = y - ŷ

Residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a nonlinear model is more appropriate.

400

Tell why an Experimental design would need:

-a Placebo

- to be single blind or double blind

- have a control group

In an experiment, subjects respond differently after they receive a treatment, even if the treatment is neutral. A neutral treatment that has no "real" effect on the dependent variable is called a placebo, and a subject's positive response to a placebo is called the placebo effect.

Blinding is the practice of not telling subjects whether they are receiving a placebo. In this way, subjects in the control and treatment groups experience the placebo effect equally. Often, knowledge of which groups receive placebos is also kept from analysts who evaluate the experiment. This practice is called double blinding. It prevents the analysts from "spilling the beans" to subjects through subtle cues; and it assures that their evaluation is not tainted by awareness of actual treatment conditions.

In an experiment, a control group is a baseline group that receives no treatment or a neutral treatment. To assess treatment effects, the experimenter compares results in the treatment group to results in the control group.

400

Define the mean of a Discrete Random Variable

The mean of the discrete random variable X is also called the expected value of X. Notationally, the expected value of X is denoted by E(X). Use the following formula to compute the mean of a discrete random variable.

E(X) = Σ [ xi * P(xi) ]

where xi is the value of the random variable for outcome i, and P(xi) is the probability that the random variable will be equal to outcome i.

400

The equation y-hat=4.8 +0.21x represents a linear regression of Happiness Score (1 to 10) based on Income in thousands of dollars.

r=0.58 and r²= 0.336, interpret this values.

r is the correlation coefficient. Given that the relationship is linear, 0.58 means that the relationship is positive and weak.

r² is the coefficient of determination. Given that the relationship is linear, 0.336 means that 33.6% of error has been removed when comparing the y-hat prediction to just using the average of y.

500

Draw a picture of the Empirical Rule on the Board. And define how the horizontal axis relates to z-scores.

Empirical Rule: If the number of elements in the set is large, about 68% of the elements have a z-score between -1 and 1; about 95% have a z-score between -2 and 2; and about 99% have a z-score between -3 and 3.

A z-score (aka, a standard score) indicates how many standard deviations an element is from the mean. A z-score can be calculated from the following formula.

z = (X - μ) / σ

500

Tell the difference between an Influential Point, a High Leverage point and a Outlier.

An influential point is an outlier that greatly affects the slope of the regression line. One way to test the influence of an outlier is to compute the regression equation with and without the outlier.

High Leverage Point: A data point is considered to be a High Leverage Point if it has extreme predictor input value. An extreme input value simply means extremely low or extremely high value as compared to other data points in the entire Data set

Outliers have a extra large residual value.

500

Consider this example. A drug manufacturer tests a new cold medicine with 200 volunteer subjects - 100 men and 100 women. The men receive the drug, and the women do not. At the end of the test period, the men report fewer colds.

Describe 3 problems with this experimental design

- no control group

- confounding variables

- no blocking for symptoms or gender or anything

- no random assignment

500

Define how to Enter a probability distribution in the Calculator and find the mean and standard deviation for a discrete Random Variable.

Stat, Edit, Enter x_i in list 1 and each P(x_i)

Stat, >, Calc, 1: 1-Var stats, make frequency list L2

x bar= is mean or Expected value

sigma_x= Standard deviation

500

Wile E. Coyote is pursuing the Road Runner across Great Britain toward Scotland. The Road Runner chooses his route randomly, such that there is a probability of 0.7 that he’ll take the high road and 0.3 that he’ll take the low road. If he takes the high road, the probability that Wile E. catches him is 0.02. If he takes the low road, the probability the Road Runner gets caught is 0.05.

(a) What is the probability that the Road Runner gets caught?

(b) Suppose that the Road Runner got caught. What is the probability that he took the high road?

Draw a tree diagram to explain

a) 0.29

b) 0.483