Missing Data & ML Models
Statistics & Modelling Concepts
Data & AI Infrastructure
Bias & Variance
Time Series & Model Interpretability
100

What are the three types of missing data?

MNAR, MCAR, MAR

100

What is it called when a model is too complex?

Overfitting.

100

What type of software interface allows different applications to communicate and integrate services?

APIs.

100

This term refers to the difference between a model's predicted value and the actual observed value.

Error

100

This term refers to understanding and explaining how a machine learning model makes its predictions.

Interpretability.

200

This machine learning model splits data into branches based on feature values, forming a structure similar to a flowchart.

Decision Tree

200

What technique prevents overfitting? 

Regularization. 

200

What Python library is widely used for data analysis?

Pandas.

200

This type of error refers to the difference between a model's prediction and the true value due to the model's assumptions.

Bias

200

These are the conditions that must be met for a model, like linear regression, to produce unbiased and reliable estimates. Do not violate them.

Model Assumptions.

300

What is one common method for handling missing data?

Imputation

300

This correlation measure is used to assess the strength and direction of the monotonic relationship between two variables, based on their ranks.

Spearman Rank Correlation

300

What process involves identifying and resolving errors in software to ensure correct functionality?

Debugging. 

300

This type of error refers to the model’s sensitivity to small fluctuations in the training data.

Variance.
300

This interpretability method is based on Shapley values from game theory and provides consistent, global explanations for feature importance.

SHAP

400

This regression technique extends linear regression by adding higher-degree terms to model nonlinear relationships.

Polynomial Regression

400

What process involves selecting a subset of individuals from a population to draw conclusions about the entire group? 

Sampling Methods.

400

What AI research lab, originating from China, focuses on open-source large language models?

Deepseek.

400

This concept describes the tradeoff between two types of errors.

Bias Variance Trade-off

400

What term describes machine learning models whose decision-making processes are not easily interpretable?

Black-box models.

500

A non-parametric model that classifies new data points based on the majority class of their nearest neighbors.

KNN

500

What statistical technique determines the minimum sample size required for detecting an effect?

Power Analysis.

500

Developed by Meta, this family of open-weight large language models is named after a South American animal. 

LLaMA.

500

What statistical technique is used to ensure models generalize well by preventing overfitting?

Regularization

500

In time series analysis, this term refers to when past values are used as predictors for future values, introducing delays. 

Lagging.