Re-expressing Data: Get It Straight!
Understanding Randomness
Sample Surveys
A look a Bias
Identifying Sampling Methods
100

A means of altering the data to achieve the conditions/structure necessary to utilize particular summaries or models.

What is re-expression?

100

1. Nobody can guess the outcome in advance.

2. Outcomes are equally likely.

What is it about random selection that makes if seem fair?

100

The entire group of individuals or instances about whom we hope to learn, but examining all of them is usually impractical, if not impossible.

What is the population?

100

Any systematic failure of a sampling method to represent its population.  It is almost impossible to recover fromm.

What is Bias?

100

A sample in which each set of n elements in the population has an equal chance of selection.  

The standard method of utilizing randomization to make the sample representative of the population of interest.

What is Simple random sample (SRS)?

200
1. Make the form of a scatterplot straighter.

2. Make the scatter in a scatterplot more consistent (not fan shaped).

3. Make the distribution of a variable (histogram) more symmetric.

4. Make the spread across different groups (boxplots) more similar.


What are several reasons to consider a re-expression?

200

We know what outcomes could happen, but not which particular values will happen.  Outcomes that we cannot predict but that nonetheless have a regular distribution in very many repetitions.

What is a random event/phenomenon?

200

A (representative) subset of a population, examined in hope of learning about the population is a _______(1).  

A study that ask questions of a _(1)_ drawn from some population in the hope of learning something about the entire population (Polls) is a ___________.

What is a sample?


What is a sample survey?

200

______________ is often the best use of time and resources when sampling or surveying.

What is reducing biases?

200

The natural tendency of randomly drawn samples to differ from each other.

What is sampling variability?

300

Orders the effects that the re-expression have on the data.  A good starting point is  ______.  If all else fails try ____________.

What is the Ladder of Powers good for?

What is taking logs.

try whacking the data with two logs (log x and log y).

300

A sequence of random outcomes that model a situation, often difficult to collect data on and with a mathematical answer hard to calculate.

Models random events by using random number to specify event outcomes with relative frequencies that correspond to the true real-world relative frequencies we are trying to model.

An artificial representation of a random process used to study its long-term properties.

What is a simulation?

300

This is any summary calculated form the (sampled) data while that are key numbers in mathematical models used to represent reality.

This is a statistic.  They are written in Latin letters.

That is a parameter.  They are written in Greek letters.

300

The best defense against bias is ______ (stirring to make sure that on average the sample looks like the rest of the population).

What is Randomization?

300

The precision of the statistics of a sample depend on _______ not ___________.

What is the sample size (soup spoon)?

What is its fraction of the larger population?

400

1. Can't straighten scatterplots that turn around.

2. Can't re-express "-" data values with square root (+constant to shift >0)

3. Minimal affect on data values far from 1-100 (-constant to shift)

4. Can't unify multiple modes.

What are limitations of re-expression?

400

The most basic situation in a simulation in which something happens at random [random happening] is a _______.

What is a component?

400

A numerically valued attribute of a model for a population, often unknowable and estimated from sampled data is a _______(1).

Statistics computed from a ______ sample accurately reflect the corresponding _(1)_.

What is a population parameter?


What is a representative sample?
400

This type of bias occurs when individuals can choose on their own whether to participate in the sample.  Always yields invalid samples.

This type of bias occurs when the sample is comprised of individuals readily availab.e  Always yields a non-representative sample.

What is Voluntary response bias?


What is Convenience bias?

400

A list of individuals, which clearly defines but may not be representative of the entire population , from which the sample is drawn.

as compared to ...

A sample that consists of the entire population.

What is a sampling frame?



What is a census?

500

When discussing the accuracy or confidence of the linear regression model be sure to comment on both the appropriateness of _____________ and success of _____________.

What is the appropriateness of the model as indicated by the residual plot

success of the model as indicated by R2

500

An individual result of a component [result of random happening] is an ______ and the sequence of several components representing events that we are pretending will take place is a _____(2).

The result of each _(2)_ with respect to what we were interested in is the ____________.

What is an outcome?


(2) What is a trial?

What is the response variable?


500

This corresponds to, and thus estimates, a population parameter?

What is a sample statistic?

500

This type of bias is when individuals from a subgroup of the population are selected less often than they should be.

This type of bias is when a large fraction of those sampled will not or cannot respond.

This type of bias is when respondents' answers might be affected by survey design, such as question wording or interviewer behavior.

What is Undercoverage bias?

What is Nonresponse bias?

What is Response bias?

500
S_________: These samples can reduce sampling variability by identifying homogeneous subgroups and then randomly sampling within each.

C_________: These samples randomly select among heterogeneous subgroups that each resemble the population at large, making our sampling tasks more manageable.

Sy________: These samples can work, when there is no relationship between the order of the sampling frame and the variables of interest, and are often the least expensive method of sampling.  But we still want to start them randomly.


What is Stratified samples?

What is Cluster samples?

What is Systematic samples?

M
e
n
u