LLM Jeopardy

Vector DB

LLM Engines

Phoenix

Work Flows

Orchestration

100

Name 3 vector DB technologies

What is Milvus, PineCone, Chroma, LanceDB, Reddis, Weaviate

100

Large language models are based on what ML Architecure

What is the transformer architecture

100

Phoenix can be views in either ___ or ____.

What is browser or notebook

100

The first issue we see, and often the easiest to uncover is ___.

What is bad responses

100

Name two LLM Orchestration tools.

What is llamaidex and langchain

200

Name 2 Types of unstructured data that can be stored in a Vector DB

What are tokens, documents, text, audio, images, etc...

200

LLMs tend to run on what type of chip for inferences and training?

What is a GPU

200

What model type is not supported by Phoenix?

What is Ranking

200

What are three issues that might come up in a search and retreival system?

What is poor retrieval, bad responses, hallucinations, missing context, bad prompts

200

What's the difference between a chain and an agent?

What is chains have a sequence of actions that is hardcoded and agents use a language model as a reasoning engine to determine which actions to take and in which order.

300

Splitting a document into smaller ones in a hierarchical and iterative manner until certain criteria (e.g. number of tokens) are reached, is what type of chunking strategy.

What is recursive chunking

300

Name at least 2 stages in the creation or deveopment of LLMs model. To be more specifc, how do we create base LLM models and how do we improve upon them

what is Pretraining, Fine Tuning, RL, HF, RLHF

300

collection of documents used in RAG usecases

what is a corpus dataset

300

What are four steps that could be taken to improve a search & retrieval system?

What is (1)add docs to knowledge base, (2)change chunking strategy, (3)retreive additional context, and (4)rank context prior to repsponding?

300

This capability enables chatbots to maintain coherence across multiple queries, allowing a chat-like manner of interaction with users.

What is conversational memory

400

What retreival strategy is used in which you retrieve the top most relevant chunks, query each chunk with the prompt, and then ask the LLM to combine all the responses.

What is map-reduce

400

Name 3 LLM model names (hint: animal plant names are common)

What is Vicuna, Alpaca, Bison (TextBison), Falcon, Firefly, Palm, GPT etc...

400

What app allows users to run Phoenix in a databricks or locally from a remote Jupyter instance?

What is using ngrok

400

True or False: Query density allows you to identify small concept gaps in your knowledge base.

What is false.

400

This is the default state of a Large Language Model (LLM), where each query is processed without considering past interactions.

What is stateless

500

What are 3 ways to determine similiarty when searching for emebddings?

What is euclidean dist, cosine similarity, dot product

500

In the context of neural sequence-to-sequence models, decoding strategies help determine the next token (e.g., word or subword) to be generated, based on the model's predictions. Name 1 decoding strategy

what is greedy decoding, topK, nucleus sampling, beam search

500

What is required that you don't need a new ngrok tunnel every time?

What is Phoenix default port.

500

Ranking metrics allow you to directly measuring effectiveness of retrieval, however they require ____ & ____.

What is ground truth and additional LLM calls.

500

What are the four document chains offered by langchain?

what is stuff, map-reduce, refine, map re-rank