A visual that shows how well or poorly a classification model differentiates between classes
What is a confusion matrix?
The original architecture for the Transformer model was first published in what year?
What is 2017?
This OpenAI language model was released in June 2020 and contains 175 billion parameters
What is GPT-3
This is the name for a data point whose value is not far from the rest in terms of the overall distribution but is far from points nearby in time or space
This metric is particularly useful when you are more concerned about the number of false negatives than the number of false positives, and is calculated as the ratio of correctly predicted positive observations to all actual positives.
What is recall?
The main use case of the Transformer architecture as proposed in the original paper
What is Machine Translation?
This company developed the Llama series of models and has taken a strong pro-open source stance on development of LLMs.
What is Meta?
A visual representation that helps in understanding the distribution of data, which often consists of bars used to show the frequency of numerical or categorical data.
What is a histogram?
This metric, useful for comparing regression models that work with variable scales, is the square root of the average of squared differences between prediction and actual observation.
What is Root Mean Squared Error?
An encoder-only Transformer architecture commonly used to generate embeddings of text
What is Bidirectional Encoder Representations from Transformers (BERT)?
This French company founded in 2023 developed open source LLMs which were widely regarded as among the top LLMs.
What is Mistral?
This technique is often used for visualizing high-dimensional data in 2 or 3 dimensions which show the maximal variance between data points
What is Principal Components Analysis?
This commonly used metric in natural language processing evaluates model performance based on the overlap of n-grams between the generated text and the reference text.
What is BLEU (Bilingual Evaluation Understudy)?
The number of different types of attention used in the decoder side of a Transformer architecture as defined by the original paper
What is 2?
This set of techniques allows for efficient customization of LLMs for specific use cases.
What is PEFT (Parameter-Efficient Fine-Tuning)?
In this form of encoding of categorical variables, each unique value of the variable becomes a separate feature in the dataset.
What is one-hot encoding?
This metric for evaluating text generation tasks takes into account both the precision and recall of words in the generated text compared to the reference, aiming to offer a more balanced approach than metrics such as BLEU.
What is ROUGE (Recall-Oriented Understudy for Gisting Evaluation)?
The most commonly used loss function to train transformers for text generation
What is cross-entropy?
This technique allows LLMs to include a much larger number of parameters, but only use a selected subset for each inference run, allowing for specialization and reduction in inference latency/cost.
What is mixture-of-experts?
This set of statistical techniques is used to analyze the difference among means of two or more groups or treatments.
What is ANOVA?