AI Practitioner Final Review

Hot Tub Time Machine

Snakes On A Plane

Mona Lisa Smile

Double Jeopardy

The Kentucky Fried Movie

100

A company manually reviews all submitted resumes in PDF format. As the company grows, the company expects the volume of resumes to exceed the company's review capacity. The company needs an automated system to convert the PDF resumes into plain text format for additional processing.

Which AWS service meets this requirement?

Amazon Textract

Amazon Personalize

Amazon Lex

Amazon Transcribe

Amazon Textract is a service that automatically extracts text and data from scanned documents, including PDFs. It is the best choice for converting resumes from PDF format to plain text for further processing.

100

You are developing a machine learning based application that responds to user behavior with dynamic prompts in under 10 milliseconds. The backend requires globally distributed, low-latency access to structured session data. Which AWS database architecture is the most appropriate?

Amazon Aurora Global Database

Amazon DynamoDB with global tables

Amazon DocumentDB with custom indexing

Amazon Redshift with materialized joins

Amazon DynamoDB is a fully managed NoSQL database service that delivers single-digit millisecond performance at any scale. With global tables, DynamoDB allows you to replicate your data across multiple AWS Regions, enabling low-latency, high-speed access to structured session data anywhere in the world. This setup is ideal for real-time applications, such as machine learning systems that need to generate dynamic responses in under 10 milliseconds.

100

Which prompt engineering technique involves providing the model with some input-output pairs to guide its response.

Chain- of- thought prompting

Prompt templating

Few-shot learning

Zero-shot learning

Few-shot learning involves providing the model with a small number of examples (input-output pairs) within the prompt to demonstrate the desired behavior or task. These examples act as a guide for the model to understand the expected format and content of the response.

100

Your organization is setting up a secure data engineering workflow and must ensure that only authorized users can access sensitive data. Which of the following are data access control best practices that you should implement (select two)

Segmentation

RBAC

Data Lineage tracking

Flat permission

1, 2

Segmentation (network segmentation) divides a network into smaller, isolated subnetworks. this limits the impact of a security breach by containing it within a specific segment.

Role-based access control (RBAC) assigns permissions to roles (e.g., "data scientist", "database administrator") and then assigns users to those roles. This simplifies access management by grouping permissions and makes it easier to manage access for a large number of users.

100

A company has a foundation model (FM) that was customized by using Amazon Bedrock to answer customer queries about products. The company wants to validate the model's responses to new types of queries. The company needs to upload a new dataset that Amazon Bedrock can use for validation.

Which AWS service meets these requirements?

Amazon S3

Amazon Elastic Block Store (Amazon EBS)

Amazon Elastic File System (Amazon EFS)

AWS Snowcone

'Amazon S3': This is the correct answer because Amazon S3 is widely used for storing large datasets that are accessed by machine learning models, including those in Amazon Bedrock.

200

A financial AI system processes confidential transaction records on AWS and stores results in Amazon S3. The security team wants to ensure all encryption keys are centrally managed, and access to those keys is restricted by role. What is the most appropriate solution?

Use client-side encryption and store the keys in a local KMS server

Encrypt data using SSE-KMS with AWS-managed keys (aws/s3)

Use AWS KMS with customer-managed keys and IAM policies for fine-grained control

Configure Amazon S3 bucket policies to handle encryption at rest

AWS Key Management Service (KMS) allows users to create customer-managed keys (CMKs), giving them full control over key policies, rotation, deletion, and access. When sensitive financial data is involved—like transaction records—centralized key management with strict IAM-based access control is essential. Using CMKs lets the security team define fine-grained permissions for which roles or services can use the key for encryption or decryption. This approach ensures that only authorized roles can access encrypted data, and all key usage is logged in AWS CloudTrail for auditing. It aligns with security best practices for regulated industries like finance.

200

A company is evaluating using AWS’s generative AI offerings to improve its operations and wants to understand the key benefits these services provide. Which of the following are advantages of using AWS generative AI services (SELECT THREE)

Low barrier to entry

Zero compliance needs

Speed to market

Enhanced accessibility

Removal of human effort

Perfect predictions by models

1, 3, 4

A low barrier to entry is an advantage of using AWS generative AI services. Services such as Amazon Bedrock and SageMaker JumpStart provide easy access to pretrained models and tools, reducing the need for deep machine learning expertise to get started.

Speed to market is an advantage. AWS services offer prebuilt solutions, managed infrastructure and streamlined workflows which can significantly accelerate the development and deployment of generative AI applications.

Enhanced accessibility is an advantage. AWS provides easy access to powerful foundation models and other resources, making generative AI technology more accessible to a wider range of developers and organizations

200

A company has deployed a machine learning model to predict customer churn. They want to find any data drift over time. Which AWS service is MOST suitable for this task?

Amazon SageMaker Neo

Amazon SageMaker Model Monitor

Amazon SageMaker Debugger

Amazon SageMaker Ground Truth

Amazon SageMaker Model Monitor is most suitable service for this task. Model Monitor is designed specifically for monitoring deployed models in production. It detects data drift (changes in the input data distribution), concept drift(changes in the relationship between input features and the target variable), and other performance issues that may arise over time.

200

A media production company is considering the cloud to manage rendering workloads, video editing pipelines, and content distribution. The CTO is focused on how cloud computing might enhance their agility and scalability for dynamic workloads. Which of the following are the advantages of cloud computing? (Select TWO.)

Ability to provision large-scale computing capacity in minutes without upfront hardware investment.

Guaranteed isolation from other tenants at the hardware level in all cloud configurations.

Your selection is correct

Rapid deployment of high-performance computing resources with global reach.

Permanent data residency guarantees in any geographic region, regardless of regulations.

Automatic conversion of legacy software into cloud-native applications without code refactoring.

1,4

Ability to provision large-scale computing capacity in minutes without upfront hardware investment:

One of the key advantages of cloud computing is on-demand scalability. Media companies with dynamic workloads—like rendering and video processing—can instantly scale up computing power without needing to buy or maintain physical hardware. This flexibility allows teams to focus on creativity and delivery while minimizing infrastructure costs, especially for bursty or high-performance tasks like rendering.

Rapid deployment of high-performance computing resources with global reach:

Cloud providers like AWS offer high-performance compute (HPC) resources across multiple regions worldwide. This allows media production teams to process and distribute content closer to end-users, improving performance and reducing latency. It also means they can quickly spin up compute environments in different geographies, supporting remote teams or regional content delivery without delay.

200

A generative model in a customer service application occasionally produces inconsistent and imaginative responses, even when the input intent is clear. You want to constrain the model to produce more focused and repeatable outputs without reducing the vocabulary scope significantly. Which hyperparameter modification best supports this objective?

Reduce model depth to minimize creative reasoning paths

Expand the token limit to ensure more deterministic output

Lower the temperature value to reduce sampling probability dispersion

Increase the number of attention heads to refine contextual representation

The temperature hyperparameter controls the randomness of a generative model’s output. A lower temperature value reduces the range of possible token choices, making the output more deterministic, focused, and repeatable. This is ideal for customer service applications where consistency and clarity are more important than creativity. Lowering the temperature doesn't shrink the model's vocabulary but makes the model favor higher-probability tokens over more creative or unexpected ones. This helps the model stay on-topic and avoid imaginative or off-brand responses while still using natural language effectively.

300

A media company is exploring the use of generative AI to improve productivity and creativity. The company wants to use a model that can perform tasks like summarizing long-form articles, generating promotional content, and translating blogs into multiple languages. The company does not have in-house ML expertise and prefers a fully managed solution with minimal setup effort. They are exploring Amazon Bedrock for implementing this solution and is learning about the types of models it can utilize. Which of the following accurately describes a Foundation Model (FM) as defined in Amazon Bedrock?

A low-latency model designed for near real-time personalization based on user activity

A model specialized for a single task and trained only on labeled data

A general-purpose model trained on a large diverse set of unstructured data and usable across multiple tasks without retraining

A model designed exclusively for vector search and document indexing

Foundation Models (FMs) in Amazon Bedrock are large, general-purpose models trained on vast amounts of unstructured data such as text, images, code, or audio. These models are designed to understand and generate human-like language, which makes them adaptable to a wide range of tasks without needing to be retrained. For example, the same FM can be used to summarize long articles, generate creative marketing content, and translate text into multiple languages. Amazon Bedrock provides access to FMs from leading providers like AI21 Labs, Anthropic, Cohere, Meta, and Amazon’s own Titan models. Since it is a fully managed service, customers can integrate these models into their applications using simple APIs—without needing machine learning expertise or infrastructure setup.

300

A software development team is building a system that needs to detect fraudulent transactions in real time. Initially, the team considered using hard-coded rules based on predefined thresholds. However, the business requirements are expected to change frequently, and the team is now evaluating the use of machine learning (ML) to improve adaptability and long-term performance. What is the key benefit of training a machine learning (ML) model instead of using hard-coded rules in a software system?

ML models eliminate the need for algorithmic logic or input features during training.

ML models depend entirely on fixed logic, offering consistency over adaptability.

ML models generalize from data to recognize patterns and improve over time without explicit programming.

ML models can be manually reprogrammed to adapt to future business needs.

Machine Learning (ML) allows systems to learn from historical data and make decisions or predictions without being explicitly programmed with fixed rules. This is especially useful in fraud detection, where patterns are often complex and constantly evolving. Instead of relying on hard-coded rules (which can quickly become outdated), ML models analyze data to detect subtle patterns and anomalies that might indicate fraudulent activity. As more data becomes available, the model can be retrained to adapt and improve. This makes ML much more flexible and scalable in dynamic environments compared to static rule-based systems.

300

A media company deploys an AI-powered tool to generate brief news article summaries from extensive reports. The team must evaluate the quality and relevance of the summaries before publishing. Which of the following evaluation methods best assesses the summarization quality?

F1-score calculation to measure accuracy in identifying article sentiment polarity.

Precision-recall analysis to determine model effectiveness in categorizing news articles.

Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric comparing summaries to reference articles.

Area Under the Receiver Operating Characteristic Curve (ROC AUC) to assess binary classification accuracy.

ROUGE is a set of metrics designed to evaluate automatic summarization and machine translation by comparing the overlap between machine-generated content and human-written reference texts. It focuses on recall, measuring how much of the reference content is captured in the generated summary. ROUGE-N evaluates n-gram overlaps, while ROUGE-L assesses the longest common subsequence between texts. In the context of summarization, ROUGE is widely adopted because it effectively quantifies the similarity between AI-generated summaries and human-authored ones, ensuring the summaries retain essential information from the original reports

300

A travel company wants to use machine learning to forecast customer demand for various holiday packages. The data analysts prefer a platform with ready-made models and quick setup options to avoid complex coding from scratch. They are assessing Amazon SageMaker JumpStart for their requirements. Which TWO statements correctly describe the key features of Amazon SageMaker JumpStart? (Select TWO.)

Automatically identifies fraudulent travel transactions without additional configuration.

Includes built-in workflows that allow teams to deploy ML solutions quickly.

Automatically translates booking data into multiple languages for analysis.

Directly provides real-time travel advisory updates without model customization.

Offers pre-built machine learning models for rapid forecasting and predictions.

2, 5

Offers pre-built machine learning models for rapid forecasting and predictions:

Amazon SageMaker JumpStart provides a collection of pre-trained models and ready-to-use solutions for common machine learning tasks, such as demand forecasting, image classification, and text analysis. This makes it easier for teams—especially those without deep ML expertise—to quickly start projects. For the travel company, JumpStart’s pre-built models can help forecast customer demand for holiday packages without writing code from scratch, making it ideal for fast and effective ML adoption.

Includes built-in workflows that allow teams to deploy ML solutions quickly:

One of the strengths of SageMaker JumpStart is its built-in end-to-end workflows. These templates guide users through the full machine learning lifecycle, from data processing to model training, evaluation, and deployment. This simplifies complex ML tasks and lets teams focus on results rather than the technical details. It’s perfect for data analysts who want to move quickly and avoid spending time on setup and configuration.

300

A retail company is using generative AI to create personalized marketing content. The security team wants to stay ahead of cyberattacks and also reduce any weaknesses in the AI system. What is the best way to meet both goals?

Use threat detection to monitor for live cyberattacks and vulnerability management to find and fix weak spots before they are exploited.

Use threat detection to check for old software libraries, and have vulnerability management block external API calls.

Let vulnerability management track traffic spikes and use threat detection to deal with outdated APIs.

Assign fraud alerts to vulnerability management, and let threat detection focus on finding code injections.

Threat detection involves continuously monitoring systems, networks, and applications to detect signs of cyberattacks in real time. It helps identify malicious behaviors such as unauthorized access, data breaches, or suspicious user activity.

Vulnerability management is the process of identifying, evaluating, and addressing security flaws or misconfigurations before attackers can exploit them. This includes scanning for outdated software, weak permissions, or exposed endpoints.

Together, these two practices form a proactive and reactive security approach: threat detection catches live attacks, while vulnerability management reduces the chances of those attacks being successful by patching known weaknesses.

400

A global consulting firm is building a platform to automatically summarize meeting recordings and generate action items. The platform processes audio data and converts it into structured summaries. Which approach is most suitable for transforming human speech into meaningful text summaries?

Apply graph neural networks to learn the spatial structure of voice patterns in raw audio waveforms

Use robotic process automation (RPA) to analyze the audio pattern of the meeting and infer summaries visually

Use Natural Language Processing (NLP) techniques to transcribe, analyze, and summarize human conversations effectively

Apply computer vision techniques to detect facial expressions and generate summary reports from visual cues

Natural Language Processing (NLP) is a specialized field within artificial intelligence that focuses on the interaction between computers and human language. It enables machines to understand, interpret, generate, and respond to human language in a meaningful and useful way. NLP combines computational linguistics with machine learning and deep learning techniques to process text and speech data. This allows systems to perform tasks such as language translation, sentiment analysis, speech recognition, and text summarization.

400

A development team has deployed a generative AI model with guardrails in place. However, during testing, they discovered that cleverly crafted inputs can still manipulate the model into producing outputs that violate the intended safety guidelines, despite restrictions. What security risk does this scenario most accurately represent?

Data leakage from model memory during inference

Unauthorized parameter access via fine-tuning backdoors

Jailbreaking of the model using adversarial prompts

Indirect prompt injection through prompt chaining

Jailbreaking is a technique where users craft special, tricky inputs (called adversarial prompts) to bypass the safety or ethical guidelines of an AI system. Even if guardrails are implemented, some users may find ways to "trick" the model into generating unsafe, biased, or restricted content. In our case, the team discovered that the model still produces problematic outputs when tested with cleverly crafted prompts. This directly points to a jailbreaking scenario, where input manipulation is used to override intended safety mechanisms. This is a common concern in generative AI models, making it crucial to continuously test and improve safeguards.

400

A fashion retailer is developing a model to predict the style category of each clothing item. Each item must belong to only one category, such as formal, casual, or sportswear. Which type of classification is best suited?

Multi-label classification assigns multiple labels for a single instance, allowing overlaps.

Regression assigns a continuous value to predict the style.

Clustering groups items without predefined labels.

Multi-class classification assigns a single label from multiple possible categories to each instance.

Multi-class classification is a machine learning approach where each instance is assigned exactly one label from a set of more than two possible categories. In this case, the clothing item must be classified as either formal, casual, or sportswear—only one category per item. This fits perfectly with multi-class classification because there are multiple possible classes, but each input must belong to just one. It’s the best method when there is no overlap between categories and each prediction needs to pick only a single class.

400

A financial company is expanding its AI capabilities by integrating Large Language Models (LLMs) into its customer service and fraud detection systems. The company want to use AWS Cloud to ensure scalability, security, and efficient management of LLMs. The development team is searching for AWS services that provide end-to-end support for training, deploying, and managing these models while maintaining seamless cloud integration. Which AWS services should the company use to build and manage its LLM-powered AI solutions? (Select TWO.)

Amazon Kinesis

Amazon Rekognition

Amazon SageMaker

AWS Glue

Amazon Bedrock

3, 5

Amazon Bedrock is a fully managed service that allows you to build and scale generative AI applications using pre-trained foundation models from leading AI companies via an API. It offers serverless deployment, so there’s no need to manage infrastructure. Bedrock integrates well with AWS services and provides built-in security and compliance, making it ideal for companies that want to implement Large Language Models (LLMs) quickly and securely.

Amazon SageMaker is a comprehensive machine learning service that provides tools to build, train, fine-tune, deploy, and manage machine learning models, including LLMs. It supports custom model development, automatic model tuning, MLOps, and model hosting. For a financial company with specific needs (e.g., fraud detection or sensitive data handling), SageMaker is excellent for building custom LLM solutions with high control and enterprise-grade security.

400

A company is developing an AI solution on AWS that involves multiple departments including Data Science, DevOps, and Business Analytics. Each department requires specific permissions to AWS services but must not access resources outside their scope. What is the best way to enforce least privilege access across all departments?

Enable cross-account access between all teams and limit access with security groups.

Use AWS Organizations to assign service control policies (SCPs) to each department.

Assign users to IAM groups and manually add inline policies based on user tasks.

Use IAM roles with permission boundaries and assign them based on department function.

IAM roles allow assigning temporary, role-based access to AWS resources. When paired with permission boundaries, they provide fine-grained control over the maximum permissions a role (or user) can have—even if a more permissive policy is attached later. This approach helps enforce least privilege by ensuring each department (e.g., Data Science, DevOps, Business Analytics) only gets the access it needs for its function. It’s scalable, flexible, and aligns with AWS best practices for access management. By using roles instead of permanent user credentials, organizations also benefit from increased security and better policy control.

500

An MLOps engineer needs to deploy a computer vision model using Amazon SageMaker for an autonomous vehicle application. The model must provide ultra-low latency inference and scale instantaneously to handle varying sensor data input rates. Which SageMaker deployment strategy is MOST suitable, considering cost-efficiency for prolonged operation?

Amazon SageMaker Real-Time Inference with Provisioned Concurrency

Amazon SageMaker Batch Transform with distributed data processing

Amazon SageMaker Multi-Model Endpoints with infrequent model updates

Amazon SageMaker Asynchronous Inference with GPU instances

Amazon SageMaker Real-Time Inference involves deploying a model to a persistent endpoint that can respond to inference requests in real time. Provisioned Concurrency allows you to keep a specified number of inference containers initialized and ready to respond immediately. This is crucial for ultra-low latency requirements in an autonomous vehicle application. By pre-allocating compute resources, you minimize cold start times and ensure consistent performance, even with sudden spikes in sensor data. For prolonged operation, while it requires paying for provisioned instances, it is cost effective in the long term for latency-sensitive applications that need constant availability.

500

A marketing team has implemented a new campaign using generative AI to create targeted advertisements. What is the most effective way to quantify the overall profitability of a marketing campaign powered by generative AI?

Cross-Domain performance

Average revenue per user (ARPU)

Customer feedback

Conversion rate

Average revenue per user (ARPU) is the metric that best measures financial return. ARPU directly quantifies the average revenue generated per user, making it clear indicator of the campaign's impact on revenue. If ARPU increases after implementing the generative AI campaign, it denotes that the campaign is driving financial results.

500

A healthcare company is using machine learning to predict patient readmission rates. The data science team must select an appropriate learning type based on the availability of outcome data. Which of the following statements are true about supervised and unsupervised learning? (Select TWO)

Supervised learning relies on statistical distributions instead of actual labeled datasets.

Unsupervised learning detects relationships and structures within data where the labels or categories are unknown.

Supervised learning works with labeled data and learns from the outcome to make future predictions.

Supervised learning uses generative models to predict missing input data.

Unsupervised learning always achieves higher accuracy with structured datasets.

2, 3

Unsupervised learning is used when the data does not have any labels or known outcomes. The goal is to uncover hidden patterns, relationships, or groupings in the data. Techniques like clustering and dimensionality reduction fall under this category. In healthcare, unsupervised learning can help find patterns in patient data, like grouping patients with similar symptoms, without prior knowledge of the outcome. It’s especially useful for exploratory data analysis when outcomes are not clearly defined.

Supervised learning is a machine learning approach where the algorithm is trained on a labeled dataset. This means that the data used for training includes both the input features and the correct output (outcome or label). The goal is to learn a mapping from inputs to outputs so that the model can accurately predict outcomes for new, unseen data. In the context of healthcare, if a dataset contains patient information along with a label indicating whether they were readmitted or not, supervised learning can be used to build a model that predicts readmission risk. This learning type is widely used for classification and regression problems.

500

A marketing agency wants to generate personalized product descriptions for different users using a generative AI model. The descriptions must be unique and tailored to user preferences. Which techniques and generative AI model components would be most effective in this scenario? (Select TWO.)

Few-shot prompt engineering to guide the model’s responses

Real-time inferencing with clustering algorithms

Neural networks with labeled time-series data

Supervised learning with classification techniques

Transformer-based language models with embeddings

1, 5

Few-shot prompt engineering allows the model to generate more accurate and personalized content with minimal examples. By providing a few specific examples in the prompt, you can guide the generative AI model to create tailored product descriptions that align with individual user preferences. This technique enhances the model's ability to adapt to different contexts without needing extensive fine-tuning, making it both efficient and effective for generating unique marketing content.

Transformer-based language models with embeddings are highly effective for generating personalized product descriptions. These models, like GPT, use embeddings to represent words and phrases in a continuous vector space, capturing semantic relationships. This allows the model to understand context and user preferences better, enabling it to generate unique and tailored product descriptions. Transformer models excel in generating coherent, human-like text that adapts to different user inputs, making them ideal for creating personalized marketing content.

500

A multinational company collets customer data in both European Union (EU) and the United States (US). They need to ensure that they comply with regulations such as GDPR. Which data governance strategy is MOST directly related to addressing this compliance requirement?

Data Logging

Data Residency

Data Monitoring

Data retention

Data residency policies dictate where data is physically stored to comply with regulations such as GDPR which restrict data transfers outside specific jurisdictions. Data residency is the most directly related to compliance requirements like GDPR because it specifically addresses where data is storied and processed.