AWS AI Practitioner Section 5

Agents Like Tunes

Supermodels

Are They Super?

Running RAGged

S3 Bucket

100

A healthcare organization needs an AI-powered agent to assist patients by answering questions about their medical records and treatment options. The agent must comply with healthcare regulations like HIPAA. Can Amazon Bedrock Agents help in this scenario?

a) Yes, Amazon Bedrock Agents can be configured to comply with healthcare regulations, including HIPAA, while handling patient queries about medical records and treatments.

b) No, Amazon Bedrock Agents cannot be used in healthcare applications due to data privacy and regulatory concerns.

c) Yes, but Amazon Bedrock Agents require manual auditing and monitoring for compliance with HIPAA, as there isn't yet an option for regulatory compliance.

d) No, Amazon Bedrock Agents focus only on non-regulated industries and cannot be used in sensitive sectors like healthcare.

Correct Answer: a) Amazon Bedrock Agents can be configured to adhere to privacy and security regulations like HIPAA, making them suitable for use in healthcare scenarios.

Incorrect Answers:

b) Incorrect because Amazon Bedrock Agents are designed to be configurable and can comply with healthcare regulations, including HIPAA.

c) Incorrect because Amazon Bedrock Agents provide pre-built frameworks for regulatory compliance, reducing the need for manual auditing.

d) Incorrect because Amazon Bedrock Agents can be used in sensitive sectors like healthcare with proper configuration for regulatory compliance.

100

Which are foundation models today available in Amazon Bedrock?

a) Claude, Foundation.ai, DiffDat, Perplexity

b) Perplexity, Stable Diffusion, Amazon Titan, Llama 2

c) Foundation.ai, Claude, Llama 2, Amazon Titan

d) Perplexity, Claude, Llama 2, Amazon Titan

e) Llama 2, Stable Diffusion, Claude, Amazon Titan

e) is correct.

Explanation:Aaron made up Foundation.ai and DiffDat for this question. Perplexity is a metric to evaluate Foundation Models.

100

You are developing a model for an online retailer with a chatbot for customers to place and manage orders, among other capabilities. Which business metrics are most likely to be good candidates on which you should evaluate the model behind it? Choose the best 2.

a) Preferences of the users who used the chatbot.

b) The average revenue per user of the chat system as opposed to other ordering methods.

c) The number of dimensions the model had to use to accurately convert users to customers.

d) Ability to pass the Turing test and fool users into thinking they were chatting with a human.

e) User feedback and satisfaction with using the chatbot.

b) The average revenue per user of the chat system as opposed to other ordering methods.

e) User feedback and satisfaction with using the chatbot.

100

You are working with a customer who needs to generate accurate responses to queries using documents that are frequently updated. What feature of Amazon Bedrock Retrieval Augmented Generation would you suggest to ensure that the model has access to the latest documents?

a) Amazon Bedrock Fine-tuning

b) Amazon Bedrock Continuous Retrieval

c) Amazon Bedrock Vector Database integration

d) Amazon Bedrock Memory Augmented Networks

Correct Answer:

c) By using Vector Database integration, you ensure that your model retrieves the latest relevant documents by embedding them in a vector format, allowing quick retrieval and access to up-to-date content.

Incorrect Answers:

a) Fine-tuning updates the model itself but doesn’t handle the continuous integration of new documents into the retrieval process.

b) While continuous retrieval can help maintain relevance, it doesn’t directly imply the storage or use of updated documents in a database.

d) Memory Augmented Networks don’t specifically deal with vector database integration and are more focused on memory mechanisms for models.

100

Order these model improvement techniques from cheapest to most expensive.

I. Instruction-based Fine-tuning

II. Retrieval Augmented Generation (RAG)

III. Domain Adaptation Fine-tuning

IV. Prompt Engineering only

a) IV-III-II-I

b) IV-II-III-I

c) IV-I-II-III

d) IV-II-I-III

200

When using a single-turn messaging model in an AI-powered chatbot, how does the system typically respond to user input?

a) The model generates responses based on the context of a series of exchanges, remembering previous interactions.

b) The model requires a sequence of interactions to generate a response, which limits its usefulness for single-turn queries.

c) The model generates responses by predicting the next word in an ongoing conversation, considering previous turns.

d) The model generates a response based on the immediate query without considering prior interactions, focusing on the current input only.

Correct Answer: d) In single-turn messaging, the model responds only to the current input without any knowledge of previous exchanges. It is designed to handle isolated queries.

Incorrect Answers:

a) Incorrect because single-turn messaging models do not consider past interactions; they only handle isolated queries.

b) Incorrect because single-turn models are designed specifically for handling single queries without needing a sequence of interactions.

c) Incorrect because single-turn models do not consider previous conversation turns, they only respond to the immediate input.

200

What is the key difference between Generative AI and traditional Discriminative AI models?

a) Generative AI models predict a label or class for a given input, while Discriminative AI models generate new data based on learned patterns.

b) Generative AI models are designed for text-based tasks, while Discriminative AI models are limited to image-related tasks.

c) Generative AI models require no labeled data for training, whereas Discriminative AI models require labeled datasets for training.

d) Generative AI models generate new content or data based on learned patterns, while Discriminative AI models focus on classifying inputs into predefined categories.

Correct Answer: d) Generative AI models create new data (e.g., text, images, audio) based on learned patterns. In contrast, Discriminative AI models classify data into predefined categories or labels.

Incorrect Answers:

a) Incorrect because it's the other way around: Discriminative models classify inputs, while Generative AI models generate data.

b) Incorrect because Generative AI is not restricted to text-based tasks; it can be used for various types of data, including images, audio, etc.

c) Incorrect because both Generative and Discriminative AI models can be trained with labeled data, although Generative models are sometimes trained using unsupervised learning methods as well.

200

Which of the following metrics is commonly used to evaluate machine-generated text by comparing it to human-written reference text, measuring the overlap of n-grams?

a) ROUGE

b) BLEU

c) Perplexity

d) BERTScore

Correct Answer:

b) BLEU (Bilingual Evaluation Understudy) is an automatic evaluation metric that compares n-grams (typically unigrams, bigrams, trigrams, etc.) in the machine-generated text to reference texts, making it commonly used in machine translation and text generation tasks.

Incorrect Answers:

a) ROUGE is similar to BLEU but is focused on measuring recall-based metrics (how much of the reference is captured in the generated text) rather than precision.

c) Perplexity is a measure of how well a language model predicts a sample of text but does not compare n-grams or refer to human-written content.

d) BERTScore uses contextual embeddings from BERT to evaluate text quality, but it's not specifically focused on n-gram overlap like BLEU.

200

You are developing an application with Amazon Bedrock Retrieval Augmented Generation and need to store large-scale embeddings efficiently. What is the most cost-effective storage solution?

a) Store embeddings directly in Amazon S3.

b) Store embeddings in Amazon DynamoDB with in-memory caching.

c) Store embeddings in Amazon OpenSearch for fast querying.

d) Store embeddings in Amazon Elasticache.

Correct Answer:

c) Amazon OpenSearch is an optimized search service that can efficiently store and query large-scale embeddings, offering fast retrieval through its indexing and query capabilities.

Incorrect Answers:

a) Storing embeddings in Amazon S3 is not efficient for querying or fast retrieval.

b) DynamoDB is not specifically designed for large-scale vector embeddings and would not provide the same query efficiency as OpenSearch.

d) Amazon Elasticache is a caching service, which is not optimized for vector search or the efficient handling of embeddings.

200

Which of these typically impact cost of the AWS Bedrock Service?

a) Batch vs on-demand use, model size, temperature setting, number of output tokens

b) Number of output tokens, Top P setting, model size, batch vs on-demand use

c) Model size, number of input tokens, batch vs on-demand use, number of output tokens

d) Batch vs on-demand use, Top K setting, number of input tokens, provisioned throughput

c) Model size, number of input tokens, batch vs on-demand use, number of output tokens

300

What is true about fine-tuning a foundational model in the context of an AI application?

a) Fine-tuning is used to train a model from scratch using a small, domain-specific dataset to improve generalization.

b) Fine-tuning adjusts a pre-trained model by further training it on a smaller, task-specific dataset, enabling the model to specialize in a particular domain.

c) Fine-tuning involves making the model more successul by adding additional layers and increasing its size.

d) Fine-tuning is useful when you have an insufficient amount of data for training a new model from scratch.

Correct Answer: b Fine-tuning a foundational model involves leveraging a pre-trained model (which has been trained on a large, general dataset) and then refining it on a smaller, domain-specific dataset to specialize it for a specific task.

Incorrect Answers:

a) Incorrect because fine-tuning does not involve training from scratch; it's about refining an existing model.

c) Incorrect because fine-tuning does not typically involve changing the structure of the model (e.g., adding layers); it’s about adjusting weights based on a new dataset.

d) Incorrect because fine-tuning is often used to specialize a model, not just when there is insufficient data to train from scratch.

300

Which of the following best describes fine-tuning of a foundation model?

a) Fine-tuning involves training a foundation model from scratch on a new task, completely overwriting the previous training.

b) Fine-tuning involves using a smaller, task-specific dataset to adjust a pre-trained model, allowing it to specialize in a particular domain or task.

c) Fine-tuning involves randomly generating new data to improve the model's performance without using any labeled data.

d) Fine-tuning involves expanding the size of the foundation model to include more parameters, which improves quality of responses.

Correct Answer:

b) Fine-tuning is the process of taking a pre-trained foundation model and adjusting it with a smaller, domain-specific dataset to make it better suited for a particular task.

Incorrect Answers:

a) Incorrect because fine-tuning doesn't involve training from scratch or overwriting the previous training. It adjusts the existing model with new, targeted data.

c) Incorrect because fine-tuning uses labeled data (not randomly generated data) to refine the model for specific tasks.

d) Incorrect because fine-tuning does not involve expanding the size of the model; rather, it adjusts the model’s parameters to optimize performance for specific tasks.

300

In automatic evaluation, which metric calculates the model's ability to predict the next word in a sequence, with lower values indicating better performance?

a) ROUGE

b) BLEU

c) Perplexity

d) BERTScore

Correct Answer:

c) Perplexity is a measure of how well a probabilistic model, like a language model, predicts the next word in a sequence. Lower perplexity values indicate that the model is more confident in its predictions, thus performing better.

Incorrect Answers:

a) ROUGE is used for evaluating the overlap between the n-grams of the generated text and reference text, not for predicting word sequences.

b) BLEU measures the precision of n-grams in machine-generated text but does not directly measure prediction confidence.

d) BERTScore evaluates semantic similarity using contextual embeddings and does not specifically measure prediction uncertainty like perplexity.

300

Which RAG Data Sources does Amazon Bedrock Support? Select 3

a) Public and private Amazon S3 buckets.

b) Amazon DocumentDB

c) ServiceNow

d) Amazon Neptune Databases

e) Public websites.

f) Confluence

a) Public and private Amazon S3 buckets.

e) Public websites.

f) Confluence

300

Which statements are true about tokens and the context window in GenAI? Select 2.

a) Tokens may be a part of a word, or a word, or multiple words, but not punctuation.

b) Tokenization describes the conversion of raw text into a sequence of small chunks

c) The smaller the context window, the better the coherence within the response.

d) Large context windows require more memory and processing than small ones.

e) A large context window might be 2,500 tokens.

b) Tokenization describes the conversion of raw text into a sequence of small chunks

d) Large context windows require more memory and processing than small ones.

Incorrect Answers:

a) Punctuation can be a token too.

c) Bigger context window = better coherence.

e) The smallest of the options right now given to us in training was 32k tokens.

400

Among other departments, your online retail company has a multilingual customer support center, an environmental conservation education center, a legal counsel unit, a technical support help desk, and a customer behavior division. Which of the following is NOT likely a good use case for an Amazon Bedrock agent?

a) Amazon Bedrock Agents can be trained to provide step-by-step technical troubleshooting instructions by using knowledge bases or predefined documentation.

b) Amazon Bedrock Agents can be configured to provide educational content, answer questions, and suggest additional learning resources based on user queries related to environment conservation.

c) Amazon Bedrock Agents can accurately predict new customers' preferences whose first time it is using your website with no RAGs or training.

d) Amazon Bedrock Agents can support multilingual capabilities by utilizing pre-trained models that can switch between languages based on the user's input.

e) Amazon Bedrock Agents can use natural language processing to extract and simplify complex legal terms, providing easily understandable explanations to those inside and outside the legal counsel unit.

c) Amazon Bedrock Agents can accurately predict new customers' preferences whose first time it is using your website with no RAGs or training. This doesn't make sense because the agent would need a datasource (i.e., RAG) or some data to make predictions.

400

What is the primary advantage of using a foundation model in machine learning over training a model from scratch?

a) Foundation models can be fine-tuned for a wide range of tasks with much less data and computational resources.

b) Foundation models are pre-trained on task-specific datasets, so they perform a single task very well.

c) Foundation models are likely to outperform custom-built models on complex tasks without further training.

d) Foundation models require no additional data to perform tasks effectively.

Correct Answer: a) Foundation models, such as large language models (LLMs), are pre-trained on vast datasets and can be fine-tuned for various downstream tasks with far fewer resources and data compared to training a new model from scratch.

Incorrect Answers:

b) Incorrect because foundation models are not task-specific; they are pre-trained on large, diverse datasets and can be fine-tuned for different tasks.

c) Incorrect because while foundation models are powerful, fine-tuning is still required to adapt them for specific tasks. They do not always outperform custom models without adaptation.

d) Incorrect because foundation models still require data for fine-tuning to specialize in a specific task.

400

What is true of automatic and human evaluation metrics in model evaluation? Choose 3

a) Human evaluation is likely to be more subjective than automatic evaluation but more comprehensive.

b) Human evaluation doesn’t require any training data, whereas automatic metrics depend on them.

c) Automatic evaluation is faster and cheaper than human evaluation.

d) Human evaluation can account for nuances such as coherence, relevance, and fluency that automatic metrics might miss.

e) Automatic evaluation can't detect bias or potential discrimination against a group of people.

f) A model can only be evaluated using one automatic or human evaluation metric concurrently.

Correct:

a) Human evaluation is likely to be more subjective than automatic evaluation but more comprehensive.

c) Automatic evaluation is faster and cheaper than human evaluation.

d) Human evaluation can account for nuances such as coherence, relevance, and fluency that automatic metrics might miss.

Incorrect:

b) Human evaluation still requires training (and data for it) - it's just training of people

e) There are training datasets to help detect bias and potential discrimination

f) Multiple evaluations can occur concurrently.

400

In Amazon Bedrock Retrieval Augmented Generation, how would you ensure that the language model does NOT generate incorrect information if it fails to retrieve relevant documents?

a) Use a more powerful pre-trained model for better generation accuracy.

b) Increase the size of the vector database to reduce retrieval failures.

c) Use a higher threshold for vector similarity to ensure more relevant documents are retrieved.

d) Introduce a fallback mechanism that asks the user for more context if the retrieval fails.

Correct Answer: d) A fallback mechanism can prompt users for additional context when relevant documents cannot be retrieved, reducing the likelihood of incorrect responses.

Incorrect Answers:

a) A more powerful model may not resolve issues with retrieval failures.

b) Increasing the size of the database could increase retrieval failures if documents are not properly indexed.

c) A higher threshold for similarity might exclude relevant documents, worsening retrieval accuracy rather than improving it.

400

Which of the following are features of Guardails within AWS Bedrock? Select 3.

a) Remove Personally Identifiable Information (PII), enhancing privacy.

b) Reduce hallucinations in responses

c) Improve the accuracy of the Foundational Model

d) Prevent undesired cost from unplanned on-demand expense

e) Filter out undesirable or potentially harmful content

f) Improve the consistency of responses.

a) Remove Personally Identifiable Information (PII), enhancing privacy.

b) Reduce hallucinations in responses

e) Filter out undesirable or potentially harmful content

500

Which statements are true about Amazon Bedrock agents? Select 2.

a) Bedrock Agents create Action Groups that are called by the LLM.

b) Bedrock Agents are called by Retrieval-Augmented Generation API calls to provide responses.

c) Bedrock Agents are created by Bedrock to meet your design needs for determining output tokens from provided input tokens.

d) Bedrock Agents manage various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities

e) Agents are configured to perform specific pre-defined action groups

d) Bedrock Agents manage various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities

e) Agents are configured to perform specific pre-defined action groups

Incorrect:

a) Bedrock agents don't CREATE Action Groups - they use them.

b) Bedrock agents CALL RAGs

c) Bedrock doesn't create the Agents - you do.

500

In the context of Generative AI, which of the following best describes how a Large Language Model (LLM) generates new content?

a) LLMs generate content based on predefined templates, following specified rules and constraints.

b) LLMs use statistical patterns from vast amounts of training data to predict the next word in a sequence, generating human-like text.

c) LLMs respond to questions as they are worded and generate content from predefined structures.

d) LLMs use a fixed set of keywords in their databases to generate answers.

Correct Answer:

b) LLMs are trained on large datasets and learn the statistical relationships between words. They generate text by predicting the next word in a sequence, often producing coherent and contextually appropriate responses.

Incorrect Answers:

a) Incorrect because LLMs do not use fixed templates or predefined rules. They generate content based on learned statistical relationships.

c) Incorrect because LLMs can generate content, not just respond to specific questions. They are capable of creating a wide range of text based on prompts.

d) Incorrect because LLMs generate responses based on complex patterns, not by relying on fixed keywords, allowing for more flexibility and diversity in generated content.

500

If your model is scoring poorly in model evaluation, what actions should you take?

I. Evaluate your benchmark datasets to validate they are appropriate.

II. Fine-tune the model with a more task-specific dataset.

III. Re-evaluate the model at another time, since the non-deterministic nature of the service will generate different responses each time.

IV. Choose a different foundational model that is better suited to your needs.

a) I and IV only

b) I, II, III, and IV

c) II, III, and IV only

d) I, II, and IV only

e) I, II, and III only

d) I, II, and IV only

I. Evaluate your benchmark datasets to validate they are appropriate.

II. Fine-tune the model with a more task-specific dataset.

IV. Choose a different foundational model that is better suited to your needs.

Incorrect Item:

III. While the model will usually provide different responses each time, they will generally have the same gist, so trying later shouldn't really help.

500

A customer wants to implement a Retrieval Augmented Generation (RAG) system but has a limited budget for maintaining an extensive document corpus. What is the most efficient strategy for managing the vector database in Amazon Bedrock?

a) Store the entire document corpus as one large vector.

b) Use selective indexing and store only the most relevant documents in the vector database.

c) Regularly fine-tune the language model instead of retrieving from the vector database.

d) Increase the dimensionality of the embeddings.

Correct Answer:

b) Selectively indexing only the most relevant documents helps reduce storage costs and improves retrieval efficiency by ensuring that only important content is stored and indexed in the vector database.

Incorrect Answers:

a) Storing the entire corpus as one large vector is not practical, as it would not allow for efficient retrieval or scaling.

c) Fine-tuning the model without retrieval would reduce the system's ability to provide up-to-date information, as it wouldn't have access to specific documents.

d) Increasing the dimensionality of embeddings would likely increase computational costs and storage requirements without improving relevance or cost efficiency.

500

Which statements are FALSE about embeddings in GenAI? Select 2.

a) Fewer embeddings will likely correlate to a more accurate model.

b) An embeddings model in GenAI is stored in a vector database to relate tokens' relationships.

c) The number of tokens relates to the number of dimensions in the embeddings.

d) Vector dimensions may include semantic meaning of words, sentiment, syntax, etc.

e) Semantically similar words have similar embeddings.

a) Fewer embeddings will likely correlate to a more accurate model.

c) The number of tokens relates to the number of dimensions in the embeddings.