NLU Jeopardy!

NLU History

Transformers

IR metrics

IR models

In-context learning

Advanced behavioral testing

100

IBM's Watson system won this television game show in 2011.

What is Jeopardy!?

100

This Transformer architecture is trained with MLM and next sentence prediction objectives.

What is BERT?

100

This metric says whether there is a relevant document at or above K.

What is Success@K?

100

This is a "massaged" version of TF-IDF?

What is BM25?

100

In this new method, the LM poses questions to itself iteratively.

What is Self-Ask?

100

This is arguably the first proposal for adversarial testing.

What is the Turing Test?

200

This model began the tradition of naming models after characters from the Muppets.

What is ELMo?

200

This Transformer architecture combines an MLM-based generator with a discriminator designed to distinguish original tokens from corrupted ones.

What is ELECTRA?

200

This metric gives the percentage of documents at or above K that are relevant.

What is precision?

200

The BM25 scoring function includes a term that penalizes documents for having this property.

What is being long?

200

In the GPT-2 paper, this token is used to generate summaries.

What is "TL;DR"?

200

This principle says that the meaning of a complex phrase is a function of the meaning of its parts and how they are combined.

What is compositionality?

300

This famous AI researcher declared that "a significant advance" on core problems in AI could be made over the course of a summer at Dartmouth in 1956.

Who is John McCarthy?

300

This Transformer architecture combined an autoregressive objective with bidirectional context via permutation orders of the input sequences.

What is XLNet?

300

This metric aggregates over the precision scores for each K such that there is a relevant document at K.

What is average precision?

300

This neural retrieval model scores queries with respect to documents based on the output [CLS] representation of each of these texts.

What is DPR?

300

In this method, we sample a variety of reasoning paths and chose the answer that is most probable after marginalizing out these reasoning paths.

What is Self-Consistency?

300

This dataset from FAIR and UNC Chapel Hill is the first human-created large-scale adversarial benchmark.

What is Adversarial NLI?

400

This famous NLU system from 1970 allowed the user to give commands to a virtual robot operating in a blocks world.

What is SHRDLU?

400

This Transformer separately attends to positional embeddings and token embeddings.

What is DeBERTa?

400

This metric sums over real-valued (graded) relevance scores at each point where there is such a score.

What is Discounted Cumulative Gain?

400

This neural retrieval model scores documents with respect to queries based on MaxSim computations between every query term and every document term.

What is ColBERT?

400

This model is currently at the top of the HELM leaderboard.

What is Cohere Command beta?

400

This diagnostic technique uses a modest amount of fine-tuning on challenge data to differentiate model limitations from dataset limitations.

What is inoculation by fine-tuning?

500

This is the primary achievement of Eugene Goostman.

What is passing the Turing Test by imitating (being?) an unpleasant 13-year old boy?

500

This Transformer leads the effort to do battle with the evil forces of the Decepticons.

Who is Optimus Prime?

500

This is the metric you would favor if it was crucial to have every relevant document above K.

What is recall?

500

This neural retrieval model represents queries and documents as sparse vectors over the entire vocabulary.

What is SPLADE?

500

This person is known as the world's first prompt engineer.

Who is Riley Goodside?

500

The famous adversarial testing paper Jia and Liang 2017 is focused on this benchmark task.

What is SQuAD?