Code
Data
Models
Data Processing
ML Concepts
100

A data type in Python, this represents a number with decimal places.

What is a float?

100

A broad type of data, usually refers to labels of the data point that is represented as a string.

What is categorical data?

100

A simple model that finds thresholds to separate data in order to maximize entropy (or gini coefficient).

What is a decision tree?

100

The process of turning labels into multiple binary columns so that the model can use them as inputs.

What is one-hot encoding?

100

The type of data science task where we attempt to predict the label of a given data point.

What is classification?

200

A way to iterate over code multiple times in Python, the iteration happens until a certain condition is met.

What is a while loop?

200

This is a numerical data type that can have any possible value (including decimals).

What is a continuous variable?

200

The simplest model that exists for predicting the class/label of a data point (determining if someone lived or died on the titanic)

Logistic Regression

200

The concept of separating the data into two portions.  The model will be fitted to one and evaluated on the other.

What is the train/test split?

200

The idea that a model can be so flexible that it performs really well on the data on hand but struggles to perform on real world data.

What is overfitting?

300

The most important concept in Python that allows us to define our own object.  They have their own attributes and functions (methods).

What is a class?

300

A type of learning that does not require labels, datasets that do not have any target columns is perfect for this form of ML.

What is unsupervised learning?

300

A simple linear model that has an extra penalty applied where the coefficients of the model are squared then summed up.

What is ridge regression?
300

The process of resizing the values of each column so that they are all in the same range.

What is standardization/scaling?

300

The term used to describe how poorly a model performed.  Usually we try to minimize this value in order to improve the model.

What is loss?

400

This can be thought of as a single column inside a pandas dataframe.

What is a pandas series?

400

This is a special kind of data where the specific values don't always matter as you can rearrange them while maintaining the semantic meaning.

What is unstructured data?
400

An approach to modelling where a bunch of weaker estimators (not necessarily a decision tree) are trained on the data and then they each cast a vote.  This results in a more robust model than any one of the individual models.

What is the ensemble method?

400

The process of squaring a column, logging a column, or other mathematical operations to create new columns for the dataset.

What is feature engineering?

400

The mathematical property of a model that guarantees that the coefficients will always be the exact same.

What is convex optimization?
500

A special type of function that includes underscores in the name.  These will run implicitly depending on their specific scenarios (__init__).

What is a dunder method?

500

A special type of numerical data where each value represents a category with numerical importance (position in a race).

What is ordinal/rank data?

500

The argument that allows for models to correct for class imbalance.

What are class weights?

500

This dataframe method is very helpful for handling string columns.  It allows you to handle each value within a column as it's own variable and it takes in a function as an argument.

pd.DataFrame().apply()

500

The process of adding intentional handicaps to a model with the idea that it will generalize better to real world data.

Regularization