Easy
Moderate
Difficult
1

What is the primary purpose of data analysis?

A) To create data

B) To extract useful information from data

C) To delete unnecessary data

D) To store data

B) To extract useful information from data

1

Which of the following is a method to handle missing data in a dataset?

A) Deleting the entire dataset

B) Ignoring the missing data

C) Imputing missing values

D) Adding random values

C) Imputing missing values

1

What is the purpose of regularization in machine learning?

A) To increase the complexity of the model

B) To reduce overfitting by adding a penalty to the loss function

C) To improve the accuracy of the training data

D) To remove irrelevant features from the dataset

B) To reduce overfitting by adding a penalty to the loss function

3

Which of the following is a common tool used for data analysis?

A) Microsoft Word

B) Microsoft Excel

C) Adobe Photoshop

D) Google Chrome

B) Microsoft Excel

3

What is the difference between descriptive and inferential statistics in data analysis?

A) Descriptive statistics summarize data; inferential statistics make predictions about a population

B) Descriptive statistics make predictions; inferential statistics summarize data

C) Both summarize data

D) Both make predictions about a population

A) Descriptive statistics summarize data; inferential statistics make predictions about a population

3

Which of the following is a method used to evaluate the performance of a classification model?

A) Mean Absolute Error (MAE)

B) Root Mean Squared Error (RMSE)

C) Confusion Matrix

D) R-squared

C) Confusion Matrix

5

 What does the term "dataset" refer to in data analysis?

A) A collection of related data

B) A type of software

C) A programming language

D) A hardware device

A) A collection of related data

5

What is the purpose of a scatter plot in data analysis?

A) To show the relationship between two variables

B) To display categorical data

C) To create a bar chart

D) To edit numerical data

A) To show the relationship between two variables

5

Explain the concept of overfitting in machine learning models and how it can be prevented.

A) Overfitting occurs when a model performs well on training data but poorly on new data; it can be prevented by using cross-validation techniques

B) Overfitting occurs when a model performs poorly on training data; it can be prevented by increasing the number of features

C) Overfitting occurs when a model performs well on new data; it can be prevented by ignoring outliers

D) Overfitting occurs when a model performs poorly on new data; it can be prevented by reducing the training data

A) Overfitting occurs when a model performs well on training data but poorly on new data; it can be prevented by using cross-validation techniques

7

Which of the following is a measure of central tendency?

A) Variance

B) Standard deviation

C) Mean

D) Range

C) Mean

7

Which of the following is an example of a supervised learning algorithm?

A) K-means clustering

B) Principal Component Analysis (PCA)

C) Linear regression

D) Apriori algorithm

C) Linear regression

7

What is the purpose of Principal Component Analysis (PCA) in data analysis?

A) To reduce the dimensionality of the dataset

B) To increase the number of features in the dataset

C) To create a decision tree

D) To perform clustering

A) To reduce the dimensionality of the dataset

9

What is a histogram used for in data analysis?

A) To display text data

B) To show the distribution of numerical data

C) To create a pie chart

D) To edit images

B) To show the distribution of numerical data

9

What does the term "outlier" refer to in data analysis?

A) A data point that is significantly different from other data points

B) A common data point

C) A missing data point

D) A type of data visualization

A) A data point that is significantly different from other data points

9

In time series analysis, what does the term "seasonality" refer to?

A) A long-term trend in the data

B) Random fluctuations in the data

C) Regular patterns or cycles in the data that repeat over a specific period

D) The absence of any patterns in the data

C) Regular patterns or cycles in the data that repeat over a specific period