Error Metrics
The Transformer
LLMs
Working with Data
200

A visual that shows how well or poorly a classification model differentiates between classes

What is a confusion matrix?

200

The original architecture for the Transformer model was first published in what year?

What is 2017?

200

This OpenAI language model was released in June 2020 and contains 175 billion parameters

What is GPT-3

200

This is the name for a data point whose value is not far from the rest in terms of the overall distribution but is far from points nearby in time or space

What is a contextual outliner?
400

This metric is particularly useful when you are more concerned about the number of false negatives than the number of false positives, and is calculated as the ratio of correctly predicted positive observations to all actual positives.

What is recall?

400

The main use case of the Transformer architecture as proposed in the original paper

What is Machine Translation?

400

This company developed the Llama series of models and has taken a strong pro-open source stance on development of LLMs.

What is Meta?

400

A visual representation that helps in understanding the distribution of data, which often consists of bars used to show the frequency of numerical or categorical data.

What is a histogram?

600

This metric, useful for comparing regression models that work with variable scales, is the square root of the average of squared differences between prediction and actual observation.

What is Root Mean Squared Error?

600

An encoder-only Transformer architecture commonly used to generate embeddings of text

What is Bidirectional Encoder Representations from Transformers (BERT)?

600

This French company founded in 2023 developed open source LLMs which were widely regarded as among the top LLMs.

What is Mistral?

600

This technique is often used for visualizing high-dimensional data in 2 or 3 dimensions which show the maximal variance between data points

What is Principal Components Analysis?

800

This commonly used metric in natural language processing evaluates model performance based on the overlap of n-grams between the generated text and the reference text.

What is BLEU (Bilingual Evaluation Understudy)?

800

The number of different types of attention used in the decoder side of a Transformer architecture as defined by the original paper

What is 2?

800

This set of techniques allows for efficient customization of LLMs for specific use cases.

What is PEFT (Parameter-Efficient Fine-Tuning)?

800

In this form of encoding of categorical variables, each unique value of the variable becomes a separate feature in the dataset.

What is one-hot encoding?

1000

This metric for evaluating text generation tasks takes into account both the precision and recall of words in the generated text compared to the reference, aiming to offer a more balanced approach than metrics such as BLEU.

What is ROUGE (Recall-Oriented Understudy for Gisting Evaluation)?

1000

The most commonly used loss function to train transformers for text generation

What is cross-entropy?

1000

This technique allows LLMs to include a much larger number of parameters, but only use a selected subset for each inference run, allowing for specialization and reduction in inference latency/cost.

What is mixture-of-experts?

1000

This set of statistical techniques is used to analyze the difference among means of two or more groups or treatments.

What is ANOVA?

M
e
n
u