Section 8: AI and Machine Learning

Word Wise

Algorithm Alchemy

Metrics and Evaluation

Data Bootcamp

Data Training and Enhancements

100

Which machine learning model is specifically designed to generate human-like text or computer code based on input prompts?

A) BERT (Bidirectional Encoder Representations from Transformer)
B) GPT (Generative Pre-trained Transformer)
C) ResNet (Residual Network)
D) WaveNet (Model for generating raw audio waveform)

B) GPT (Generative Pre-trained Transformer)

GPT is designed to generate human-like text or computer code based on input prompts.

100

When is machine learning not appropriate for solving a problem?

A) When the problem is well-framed and can be solved deterministically, such as calculating the probability of drawing a blue card from a deck containing five red cards, three blue cards, and two yellow cards.

B) When the problem requires approximation and error measurement, such as predicting future trends based on historical data.

C) When the problem involves complex patterns and relationships that are difficult to model with traditional programming.

D) When the problem requires reasoning capabilities that are beyond the scope of current machine learning models.

For deterministic problems (the solution can be computed), it is better to write computer code that is adapted to the problem.

100

An ML engineer is tasked with forecasting the monthly revenue for a subscription-based service. Which evaluation metrics should be used to assess the model’s performance? (Select TWO.)

InferenceLatency

Accuracy

F1 score

Mean absolute error (MAE)

Mean absolute percentage error (MAPE)

D – Mean absolute error (MAE), E – Mean absolute percentage error (MAPE)

Explanation: MAPE (Mean Absolute Percentage Error) calculates the average of the absolute differences between actual and projected values, divides it by actual values, and returns a percentage. A lower MAPE score indicates greater model performance because the predictions are more accurate and closer to the actual values.

MAE (Mean Absolute Error) is the average difference between expected and actual values for all observations. It is a widely used statistic in numerical prediction tasks to evaluate a model’s prediction error. MAE computes the average absolute distance between predicted and actual values, making it simple to interpret. MAE is calculated by combining the absolute errors and dividing by the total number of observations. MAE values range from 0 to infinity, with lower values suggesting a better fit of the model to the dataset.

When evaluating a forecasting model for continuous numerical values like monthly revenue, it is crucial to use metrics that measure the difference between predicted and actual values. Metrics such as Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) are suitable for regression and forecasting tasks.

100

Of the options below, which will impact the types of algorithms we can use to train our models? (choose 2)

A) Labeled vs. Unlabeled Data

B) Supervised vs Unsupervised Data

C) Rainy vs Sunny Data

D) Structured vs Unstructured Data

A) Labeled vs. Unlabeled Data and D) Structed vs Unstructured

100

What is the main characteristic of a model that is overfitting?

A) The model performs poorly on both training and evaluation data.

B) The model performs well on training data but poorly on evaluation data.

C) The model has low bias and low variance.

D) The model performs poorly on training data but well on evaluation data.

B) The model performs well on training data but poorly on evaluation data.

200

What is the primary use of WaveNet in machine learning applications?

A) Image recognition
B) Speech synthesis
C) Data augmentation
D) Time series prediction

B) Speech synthesis

WaveNet is primarily used for speech synthesis in machine learning applications. It generates raw audio waveforms, allowing for highly realistic and natural-sounding speech.

200

A financial expert is building a model to predict the future value of a portfolio based on historical performance, asset allocation, and market trends. The prediction model will help in making investment decisions and optimizing the portfolio allocation strategy. Which machine-learning technique should be considered to meet this objective?

Probability density

Anomaly detection

Dimensionality reduction

Linear regression

D - Linear regression

Explanation: Regression is a supervised learning technique used for predicting continuous values. It involves determining the relationship between a dependent variable and one or more independent variables. By analyzing the patterns in historical data, regression models can predict future outcomes, making it ideal for tasks like forecasting stock prices, real estate values, or portfolio performance.

Linear regression refers to supervised learning models that use one or more inputs to predict a value on a continuous scale. It is used to predict housing prices. After training a model using a set of historical sales training data that includes those characteristics, you could forecast the price of a property based on its location, age, and number of rooms.

200

A publishing company uses a foundation model (FM) to generate text summaries from lengthy documents. The company needs to evaluate the effectiveness of the summaries. Which metric can be used to evaluate the performance of the foundation model for text summarization?

Recall

Balanced classification accuracy

Precision

Recall-Oriented Understudy for Gisting Evaluation-N (ROUGE-N)

D – Recall-Oriented Understudy for Gisting Evaluation-N (ROUGE-N)

Explanation: ROUGE-N is a widely recognized metric for evaluating text summarization models, including those generated by foundation models. ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation, and the “N” in ROUGE-N represents the length of the n-grams (sequences of N words) being compared. The primary purpose of ROUGE-N is to measure the similarity between the generated text and the human-written reference, with a higher overlap indicating better quality. This metric is handy for assessing the performance of models in tasks where the preservation of key information and linguistic patterns from the original text is crucial, such as in text summarization.

ROUGE-N can be used in Amazon SageMaker to evaluate text summarization models, especially those utilizing foundation models (FMs). This metric helps data scientists and developers understand how closely their model-generated summaries match the expected outputs, providing insights into the model’s performance. By using ROUGE-N, organizations can refine their models to generate more accurate and relevant summaries, ensuring that the outputs benefit end-users. This metric is widely recognized and part of the evaluation tools available in AWS for natural language processing tasks.

200

A Data Science team is developing an ML model to predict customer churn. As part of the initial data analysis, the team has visualized feature distributions, calculated summary statistics, and analyzed feature correlation matrices. What stage of the machine learning pipeline is the team working on?

Exploratory Data Analysis (EDA)

Model Training

Feature Engineering

Model Evaluation

A – Exploratory Data Analysis (EDA)

Explanation: Exploratory Data Analysis (EDA) is the process of analyzing and understanding the characteristics of the data before building an ML model. It involves tasks such as visualizing data distributions, calculating summary statistics, identifying missing values, and detecting outliers. EDA aims to gain insights into the data and identify potential issues or patterns that may impact the model’s performance.

200

A data scientist is working on a machine learning model using Amazon SageMaker. The model performs poorly on the training and validation datasets, showing high bias. What is the most likely cause of this issue?

Underfitting

Insufficient data preprocessing

Overfitting

Poor data quality

A - Underfitting

Explanation: Grasping the concept of model fit is crucial for identifying the underlying reasons behind a model’s poor accuracy. Underfitting is a type of error that occurs when the model cannot determine a meaningful relationship between the input and output data. You get underfit models if they have not trained for the appropriate length of time on a large number of data points. Underfitting occurs when a model is too simple to capture the underlying patterns in the data, resulting in high bias and poor performance on both the training and validation datasets.

300

Which model is known for processing sequential data, such as time series or text, and is useful for speech recognition?

A) GAN (Generative Adversarial Network)
B) RNN (Recurrent Neural Network)
C) XGBoost (Extreme Gradient Boosting)
D) ResNet (Residual Network)

B) RNN (Recurrent Neural Network)

RNN (Recurrent Neural Network) is meant for sequential data such as time-series or text, useful in speech recognition, time-series prediction

300

What term refers to a branch of AI that enables systems to learn and make predictions based on data without being explicitly structured?

Natural Language Processing (NLP)

Predictive analytics

Object-oriented programming

Machine Learning

D – Machine Learning

Explanation: Machine learning (ML) is a branch of artificial intelligence that focuses on developing algorithms that use mathematical and statistical models to perform data analysis tasks without explicit instructions. Machine learning algorithms can analyze enormous amounts of historical data and detect trends. They can apply the patterns to forecast new relationships between previously unknown data points. Data scientists, for example, may develop a machine learning model to detect cancer from X-ray scans using millions of scanned images and diagnoses. Machine learning algorithms may perform classification and prediction tasks using text, numerical, and image data.

300

A Machine Learning Engineer is training a multi-classification model for predicting musical genres. The Specialist wants to evaluate model performance through a visual representation of different metrics. Which visualization technique should the Engineer use?

Confusion matrix

Precision-Recall curve

Box plot

Correlation matrix

A – Confusion matrix

Explanation: A confusion matrix is a tool for visualizing the performance of a multiclass model. It has entries for all possible combinations of correct and incorrect predictions, and shows how often each one was made by our model.

Typical metrics used in multiclass are the same as the metrics used in the binary classification case. The metric is calculated for each class by treating it as a binary classification problem after grouping all the other classes as belonging to the second class. Then the binary metric is averaged over all the classes to get either a macro average (treat each class equally) or weighted average (weighted by class frequency) metric. In Amazon ML, the macro average F1-measure is used to evaluate the predictive success of a multiclass classifier.

300

Which hyperparameter defines the number of times the model will go through the entire training dataset?

learningRateWarmupSteps

batchSize

epochCount

learningRate

C - epochCount

Explanation: Hyperparameters are configuration settings that are set before the training process begins and control the behavior of the machine learning algorithm. These settings are not learned from the data but are tuned by the developer or data scientist to optimize the model’s performance.

An epoch is a single pass through the entire training dataset. During each epoch, the model sees the entire dataset once and updates its internal parameters (weights and biases) based on the errors it makes on the training data.

The epochCount hyperparameter defines the number of times the model will go through the entire training dataset during the training process.

In machine learning, the training process involves iteratively updating the model’s parameters to minimize a loss function (a measure of the model’s error) on the training data. The number of epochs determines how often the model will see the entire training dataset during the training process.

A higher number of epochs generally leads to better model performance, as the model has more opportunities to learn from the data. However, training for too many epochs can also lead to overfitting, where the model memorizes the training data too well and fails to generalize to new, unseen data.

300

A company is developing a model for generating text-based recommendations. The company noticed that the training data contains skewed examples, affecting the quality of the recommendations. What approach will enhance the training data to address this issue?

Ensemble learning

Transfer learning with a pre-trained model

Data augmentation

Retrieval Augmented Generation (RAG)

C – Data augmentation

Explanation: Data augmentation is a technique for expanding a dataset by generating new samples from existing data through various transformations. This technique addresses challenges in acquiring diverse real-world data by artificially increasing the dataset’s size and variety, with recent advances in generative AI enhancing its efficiency and quality.

Addressing skewed training data typically involves improving the diversity of the training data or enriching the dataset. For skewed data, data augmentation is the most effective strategy, as it introduces more variability into the training dataset, helping to mitigate the effects of data imbalance. This technique involves creating new training examples through various transformations or generating synthetic examples to balance the dataset.

400

What is the main purpose of using GANs (Generative Adversarial Networks) in machine learning?

A) Classification and regression
B) Image recognition
C) Data augmentation
D) Speech synthesis

C) Data augmentation

GAN (Generative Adversarial Networks) are models used to generate synthetic data such as images, videos, or sounds that resemble the training data. Helpful for data augmentation

400

A logistics company wants to forecast delivery times based on traffic conditions, weather data, and route information. They need an ML algorithm that produces interpretable results with a clear breakdown of how each factor influences the predicted delivery time through a hierarchical decision-making process. Which machine learning (ML) algorithm satisfies the company’s needs?

Support Vector Machine (SVM)

K-Nearest Neighbors (KNN)

Linear Regression

Decision Trees

D – Decision Trees

Decision trees are a popular machine learning algorithm known for their interpretability and simplicity. They operate by recursively splitting the data based on features like traffic conditions, weather data, and route information. This creates a tree-like structure where each node represents a decision based on a specific feature. This hierarchical structure allows decision trees to clearly illustrate how different factors influence the outcome, making it easy to understand and interpret the predictions. The transparency of decision trees makes them an ideal choice for applications where it is essential to see the breakdown of decisions.

In addition to being interpretable, decision trees can handle both categorical and numerical data, making them flexible for various predictive tasks. They can also capture non-linear relationships between features, which can be crucial when multiple factors interact in complex ways like traffic and weather impacting delivery times. However, decision trees can be prone to overfitting, especially with complex datasets, but techniques like pruning or using ensemble methods (e.g., Random Forests) can mitigate this issue. Overall, decision trees balance simplicity, flexibility, and interpretability, making them a strong choice for forecasting problems that require clear explanations of model behavior.

400

A data scientist is evaluating a machine learning model and notices that it performs well on the training data but poorly on the test data. Which combination of bias and variance is likely causing this issue?

Increased bias and increased variability

Increased bias and less variance.

Low bias but higher variability.

Low bias and low variance.

C – Low bias but higher variability

Explanation: Overfitting is a machine learning phenomenon in which a model performs well on training data but struggles to predict new, previously unseen data accurately. During training, the model becomes extremely sensitive to the patterns in the known dataset. However, when the model overfits, it is unable to generalize its predictions to other types of incoming data, resulting in poor performance and erroneous findings.

In this case, the data scientist notices that the model does well on training data but poorly on test data. This combination of low bias and high variance suggests that the model is too sensitive to the unique properties of the training data. As a result, it fails to generalize successfully to new data—a classic indicator of overfitting.

400

A financial company is using an AI model to identify potential loan defaults. To ensure the model works well in production, they must set up processes for capturing real-time data, comparing it with the training set, detecting performance issues, and generating alerts. Which stage of the model development pipeline should the company focus on?

A) Model Monitoring

B) Data Collection

C) Model Evaluation

D) Model Training

A – Model Monitoring

Explanation: The model monitoring system must capture data, compare that data to the training set, define rules to detect issues and send alerts. This process repeats on a defined schedule when initiated by an event or when initiated by human intervention. The issues detected in the monitoring phase include data quality, model quality, bias drift, and feature attribution drift.

Key components of the monitoring system include:

– Model Explainability: Verifies that the model’s predictions are understandable and reliable.

– Detect Drift: Identifies significant changes in data (data drift) and target variable properties (concept drift), alerting the system to potential performance issues.

– Model Update Pipeline: Re-trains the model if issues are detected, ensuring continuous improvement.

Model monitoring is a crucial stage in the machine learning operations lifecycle, where the deployed model’s performance is continuously monitored to ensure its operational readiness and effectiveness.

400

An AI practitioner fine-tunes an FM (Foundation Model) to achieve higher accuracy and meet a certain acceptance level. The practitioner is considering several adjustments to the training process. Which approach will be MOST effective in achieving this goal?

Increase the learningRateWarmupSteps

Reduce the model size

Decrease the epochCount

Increase the epochCount

D – increase epochCount

The epochCount hyperparameter defines the number of times the model will go through the entire training dataset during the training process.

Fine-tuning Foundation Models (FMs) often requires adjusting hyperparameters to optimize model performance.

Increasing the number of epochs allows the model to train for more iterations, which provides more opportunities for the model to learn from the data and improve its accuracy. This method aims to enhance the model’s performance to meet a specified accuracy level.

500

Which machine learning algorithm is particularly effective for classification tasks, especially in high-dimensional spaces, and is known for using hyperplanes to separate different classes?

A) Decision Tree
B) K-Means Clustering
C) SVM (Support Vector Machine)
D) Naive Bayes

C) SVM (Support Vector Machine)

SVM (Support Vector Machine): ML algorithm for classification and regression

500

Which machine learning model is known for producing artificial data by learning from existing examples?

Recurrent neural network (RNN)

Convolutional neural networks (CNN)

Reinforcement learning

Generative adversarial network (GAN)

D – Generative adversarial network (GAN)

Explanation: Generative Adversarial Networks (GANs) are a type of machine learning model designed to generate new data by learning from an existing dataset. GANs consist of two neural networks, the generator, and the discriminator, that work together in a competitive process. The generator creates synthetic data samples resembling the original training data, while the discriminator tries to distinguish between real and fake samples. As the two networks compete, the generator improves its ability to create realistic data, and the discriminator becomes better at identifying fake data. This adversarial training allows GANs to generate highly realistic data, such as images, audio, or text.

Generative Adversarial Networks (GANs) have found applications in various fields, including image generation, video synthesis, data augmentation, and more. For example, GANs have been used in fields like computer vision to generate new images for training data, enhance image quality, and even create art. Their ability to learn from existing data and produce new, realistic samples makes GANs a powerful tool for data generation tasks. However, GANs can be computationally expensive to train and may be difficult to stabilize during the training process.

500

A tech company has trained a model to classify products in a manufacturing line as defective or non-defective. To assess the model’s performance on unseen data, the team requires a solution that provides insights into its accuracy and ability to differentiate between the two categories. Which tools or metrics should be used?

F1 Score

Confusion Matrix

Precision

MSE (Mean Squared Error)

B – Confusion Matrix

Explanation: A Confusion Matrix is a table that summarizes the performance of a classification model by comparing the predicted labels with the true labels. It provides a detailed breakdown of the model’s predictions, including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The confusion matrix is particularly useful for understanding the types of errors the model makes and can help identify potential biases or imbalances in the data.

500

An e-commerce company wants to implement a machine learning application to analyze product reviews and determine whether each review is favorable or unfavorable. Which type of machine learning model is most appropriate for the application?

Clustering model

Text embedding model

Multiclass classification model

Binary classification model

D – Binary classification model

Explanation: Binary classification is a supervised machine learning model specifically designed to distinguish between two distinct categories or classes. This model is widely used in various applications, such as sentiment analysis, fraud detection, and medical diagnosis, where the objective is to classify data points into one of two predefined categories. In e-commerce, binary classification can effectively analyze customer reviews by categorizing them as favorable or unfavorable, allowing businesses to gain insights into customer satisfaction and product performance.

The binary classification process typically involves training the model on a labeled dataset, where each data point is associated with one of the two classes. The model learns to identify patterns and relationships in the data that differentiate one class from another. Once trained, the model can make predictions on new, unlabeled data, determining which of the two categories a given data point belongs to. This approach is essential for tasks like sentiment analysis, where understanding customer feedback plays a crucial role in improving products and services.

500

Which of the following scenarios best illustrates a model with high bias and low variance, and what are the implications for its performance on training and test datasets?

A) A model that fits the training data perfectly but performs poorly on test data, indicating overfitting and high variance.

B) A model that performs poorly on both training and test data, indicating underfitting with high bias and low variance.

C) A model that performs well on training data but poorly on test data, indicating overfitting with low bias and high variance.

D) A model that performs moderately well on both training and test data, indicating balanced fitting with low bias and low variance.

B) A model that performs poorly on both training and test data, indicating underfitting with high bias and low variance.