Making Decisions with Data
Supervised and Unsupervised Learning
Categorical and Numerical Data
Vocabulary
100

A student notices that a dataset contains "NC," "N.C.," and "North Carolina" to represent the same state. The student changes all entries to "North Carolina" before creating visualizations.

data cleaning

100

Finding patterns in data that doesn't have any labels

Unsupervised Learning

100

 predicting a category based on other features

Classification

100

When a decision favors some things and de-prioritizes or excludes others

Bias

200

A student creates a table that compares students' favorite school subjects (Math, Science, ELA, Social Studies) with their grade level (9th, 10th, 11th, 12th) to look for patterns between the two categories.

a cross tab (cross-tabulation table)

200

When a human trains a model to learn with examples

Supervised Learning

200

data that can be counted or measured

Numerical Data

200

data that can be separated into groups

Categorical Data

300

What type of graph works best for humans to quickly analyze data to determine the greatest or least number of votes?

Bar Graph

300

A company has thousands of emails already labeled as "Spam" or "Not Spam." A machine learning model is trained using these labeled examples to automatically sort new incoming emails.

Supervised Learning

300

Accuracy is presented at a percentage

Categorical Data

300

a computer program designed to make a decision

Model

400

A model card is being used to predict student end of year activity for school. The data was collected with 100 rows of data and us 88% accurate. Are you using this data set to train your model? Why or why not?

Yes, lots of good data and good accuracy

400

A grocery store collects data about customer purchases but does not know what groups of shoppers exist. A machine learning model analyzes buying habits and groups customers into categories such as "Healthy Eaters," "Snack Lovers," and "Bulk Buyers."

Unsupervised Learning

400

Uses cross tab tables and bar charts to represent data sets

Categorical Data

400

The inputs that a model uses to make decisions

Features

500

A model card is being used to predict a pricing for a local business. It is 90% accurate with 10 rows of data. Are you using this data set to train the model, why or why not?

No, while the accuracy is good at 90% it is only train using 10 rows of data and there should be more. 

500

A music streaming service analyzes listening habits of millions of users and groups songs with similar characteristics. It then recommends songs that are often listened to by users with similar preferences.

Unsupervised Learning

500

Uses scatter plot charts to represent data to determine if a feature should be used to train a model .

Numerical Data

500

giving examples to a model so it can learn

Training

M
e
n
u