Chapter 2

Definition

Reverse Definition

True or False

Difference

Chapter 1 Refresher

100

Supervised Machine Learning

“Supervised learning is the use of an algorithm that uses labeled data to produce a training data set, from which the algorithm can learn”

100

“______ does not use labeled data or a training data set. _____ solutions identify data patterns by discovery, without training”

Unsupervised Machine Learning

100

AI is one application of Machine Learning

False -> Machine learning is one of the applications of AI

100

“Compare and contrast supervised and unsupervised machine learning.”

Supervised learning is about prediction with labeled data, while unsupervised learning aims to uncover patterns in unlabeled data

100

Define Business Intelligence

“Business intelligence is the use of tools (data mining, machine learning, and visualization) to convert data into actionable insights and recommendations.”

200

Reinforced Learning

The use of feedback loops to reward correct predictions, and punish mistakes

200

Hierarchically structured process that can leverage layers of machine learning, for which the output of one layer becomes the input for the next.

Deep learning

200

In Machine Learning most of the algorithms are unsupervised

True

200

“Compare and contrast machine learning and artificial intelligence.”

“Machine learning is an application of AI. In other words, there are many different applications of AI, of which machine learning is one.”

200

“_____ is the use of charts and graphs to represent data. Analysts use many different charts to represent data. The key is knowing which chart to use when.”

Data Visualization

300

Testing Data

“The testing data set must have values for the independent variables with which it will perform its analysis and the dependent variable, which the model will compare to its result. ”

300

“A ________ contains data for the independent variables, which influence the dependent variable, as well as correct results for the dependent variable.”

Training Data

300

Developers normally use 20-30% of a data set as the training data set

False -> Developers normally use 70-80% of a data set as the training data set

300

Compare and contrast data classification and data clustering

Data clustering is cluster data into different groups based on their different attributes. Data classification is assigning data to matching groups.

300

Data Association

It is the process of identifying key relationships between variables.

400

Classification

A supervised machine learning solution that assigns data items to specific categorie.

400

________ is the assignment of data items to a related group. ___ uses unsupervised learning

Clustering

400

“When choosing a data set, the key is to have a larger number of independent variables”

False ->

“When choosing a data set, the key is not to have a larger number of independent variables (many may not correlate to the dependent variable), but rather, to have a larger number of data records for the key dependent variables.”

400

Compare and contrast training data and testing data.

Training data is used to teach the model, while testing data is used to evaluate its performance

400

What are the four factors that affect the quality of a set of data?

Accuracy, Consistency, Conformity, Completeness