A student notices that a dataset contains "NC," "N.C.," and "North Carolina" to represent the same state. The student changes all entries to "North Carolina" before creating visualizations.
data cleaning
Finding patterns in data that doesn't have any labels
Unsupervised Learning
predicting a category based on other features
Classification
When a decision favors some things and de-prioritizes or excludes others
Bias
A student creates a table that compares students' favorite school subjects (Math, Science, ELA, Social Studies) with their grade level (9th, 10th, 11th, 12th) to look for patterns between the two categories.
a cross tab (cross-tabulation table)
When a human trains a model to learn with examples
Supervised Learning
data that can be counted or measured
Numerical Data
data that can be separated into groups
Categorical Data
What type of graph works best for humans to quickly analyze data to determine the greatest or least number of votes?
Bar Graph
A company has thousands of emails already labeled as "Spam" or "Not Spam." A machine learning model is trained using these labeled examples to automatically sort new incoming emails.
Supervised Learning
Accuracy is presented at a percentage
Categorical Data
a computer program designed to make a decision
Model
A model card is being used to predict student end of year activity for school. The data was collected with 100 rows of data and us 88% accurate. Are you using this data set to train your model? Why or why not?
Yes, lots of good data and good accuracy
A grocery store collects data about customer purchases but does not know what groups of shoppers exist. A machine learning model analyzes buying habits and groups customers into categories such as "Healthy Eaters," "Snack Lovers," and "Bulk Buyers."
Unsupervised Learning
Uses cross tab tables and bar charts to represent data sets
Categorical Data
The inputs that a model uses to make decisions
Features
A model card is being used to predict a pricing for a local business. It is 90% accurate with 10 rows of data. Are you using this data set to train the model, why or why not?
No, while the accuracy is good at 90% it is only train using 10 rows of data and there should be more.
A music streaming service analyzes listening habits of millions of users and groups songs with similar characteristics. It then recommends songs that are often listened to by users with similar preferences.
Unsupervised Learning
Uses scatter plot charts to represent data to determine if a feature should be used to train a model .
Numerical Data
giving examples to a model so it can learn
Training