What is the process of extracting knowledge and insights from data called?
What is data science?
Name two programming languages commonly used in data science and analytics.
What is R, Python, C, C++, MATLAB, SQL,JAVA?
What is an incident where sensitive, protected, or confidential data is accessed or disclosed without authorization?
What is a data breach?
The category of machine learning that relies on labeled data to learn mappings from inputs to outputs, encompassing tasks such as regression and classification.
What is supervised learning?
This term describes the spread or variability of data points around the mean.
What is the variance?
What is the first step in the data science process, where data is collected and prepared for analysis?
What is data collection or data preprocessing?
What type of software is freely available and used for collaboration and data analysis?
What are open source tools?
This technique adds random noise to data or queries to protect individual privacy while allowing aggregate analysis.
What is differential privacy?
This technique adds a penalty to the loss function based on the square of the model weights to reduce overfitting.
What is L2 regularization (Ridge)?
This distribution is symmetric and bell-shaped, often called the "normal" distribution.
What is the Gaussian distribution?
In data science, what do we call a predictive model that is trained on historical data to make future predictions?
What is a machine learning model?
What Python library is commonly used for data manipulation and analysis using DataFrames?
What is pandas?
What is data that can be used to identify an individual, such as name, social security number, or email address?
What is personally identifiable information (PII)?
This is the activation function used in the hidden layers of most modern neural networks due to its simplicity and effectiveness.
What is RELU (Rectified Linear Unit)?
This process estimates the distribution of a statistic by resampling data with replacement.
What is bootstrapping?
Blank is systematic error leading to inaccurate results; addressing it improves fairness and accuracy. What is blank?
What is bias?
What does SQL stand for?
What is Structured Query Language?
What is the practice of collecting only the data that is strictly necessary for a specific purpose.
What is data minimization?
This happens when a model learns the training data too well, including noise, leading to poor generalization to new data.
What is overfitting?
In machine learning, what term describes the process of teaching a model to make predictions based on labeled data?
What is training?
When a model performs well on training data but poorly on unseen data what is that called?
What is overfitting?
Which command-line tool is used to create a new Git repository?
What is git init?
What is the process of removing personally identifiable information from datasets to protect individual privacy?
What is data anonymization?
The vector of partial derivatives showing the direction and rate of fastest increase of the loss function?
What is a gradient?
What is the difference between correlation and causation?
What is correlation means two variables move together; causation means one causes the other?