Data Science Basics
Programming & Tools
Ethics & Data Privacy
Machine Learning
Statistics
100

What is the process of extracting knowledge and insights from data called?

What is data science?

100

Name two programming languages commonly used in data science and analytics.

What is R, Python, C, C++, MATLAB, SQL,JAVA?

100

What is an incident where sensitive, protected, or confidential data is accessed or disclosed without authorization?

What is a data breach?

100

The category of machine learning that relies on labeled data to learn mappings from inputs to outputs, encompassing tasks such as regression and classification.

What is supervised learning?

100

This term describes the spread or variability of data points around the mean.

What is the variance?

200

What is the first step in the data science process, where data is collected and prepared for analysis?

What is data collection or data preprocessing?

200

What type of software is freely available and used for collaboration and data analysis?

What are open source tools?


200

This technique adds random noise to data or queries to protect individual privacy while allowing aggregate analysis.

What is differential privacy?

200

This technique adds a penalty to the loss function based on the square of the model weights to reduce overfitting.

What is L2 regularization (Ridge)?

200

This distribution is symmetric and bell-shaped, often called the "normal" distribution.

What is the Gaussian distribution?

300

In data science, what do we call a predictive model that is trained on historical data to make future predictions?

What is a machine learning model?

300

What Python library is commonly used for data manipulation and analysis using DataFrames?

What is pandas?

300

What is data that can be used to identify an individual, such as name, social security number, or email address?

What is personally identifiable information (PII)?

300

This is the activation function used in the hidden layers of most modern neural networks due to its simplicity and effectiveness.

What is RELU (Rectified Linear Unit)?

300

This process estimates the distribution of a statistic by resampling data with replacement.

What is bootstrapping?

400

Blank is systematic error leading to inaccurate results; addressing it improves fairness and accuracy. What is blank?

What is bias?

400

What does SQL stand for?

What is Structured Query Language?

400

What is the practice of collecting only the data that is strictly necessary for a specific purpose.

What is data minimization?

400

This happens when a model learns the training data too well, including noise, leading to poor generalization to new data.

What is overfitting?

400

In machine learning, what term describes the process of teaching a model to make predictions based on labeled data?

What is training?

500

When a model performs well on training data but poorly on unseen data what is that called?

What is overfitting?

500

Which command-line tool is used to create a new Git repository?

What is git init?

500

What is the process of removing personally identifiable information from datasets to protect individual privacy?

What is data anonymization?

500

The vector of partial derivatives showing the direction and rate of fastest increase of the loss function?

What is a gradient?

500

What is the difference between correlation and causation?

What is correlation means two variables move together; causation means one causes the other?

M
e
n
u