One Column
Two Columns
Data Analysis Process
Data Sources
100

This type of chart is best used to find the unique list of values in a column of a dataset

Bar chart

100

This type of chart is most useful for exploring two columns of a dataset when one or both are strings

Cross tab

100

Correlation does not imply ______

Causation

100

Data that is accessible to the public is called ____

Open data

200

This type of chart is best used to determine what range of values is least common in a column of a data set

Histogram

200

This type of chart is best used to see patterns and trends between two column of a dataset


Scatter chart

200

This process allows us to look at just a subset of a larger dataset

Filtering data

200

This is data that is collected by the general public

Crowdsourced data

300

This is an example of which chart type

Histogram

300

This chart shows a general  _______ between  a state's area and its order of admittance in to the union 

Correlation

300

The data analysis process can be described as a circle - how are the first and last steps related?

Generating new information helps us determine what questions to ask next

300

This type of data is essential in machine learning due to the required repetition in training

Big data

400

This type of chart is NOT useful when there are many unique values in a column of a data set


Bar chart

400

Cross tab charts are NOT useful when this is true about the data set

There are too many unique values (the chart would be too big)

400

These help us see patterns and trends that are not apparent in raw datasets

Data visualizations

400

Machine Learning algorithms trained on datasets that are NOT representative can result in this

Data bias

500

This phrase is used to describe the ranges of values used when creating a histogram

Bucket size

500

Scatter charts are NOT useful when this is true about the dataset

There are lots of repeated values
500

Users from the United States and United Kingdom responded to the same survey. This process is likely needed before the results can be analyzed

Cleaning data

500

This is included in datasets to give us information about the dataset itself

Metadata

M
e
n
u