IBM's Watson system won this television game show in 2011.
What is Jeopardy!?
This Transformer architecture is trained with MLM and next sentence prediction objectives.
What is BERT?
This metric says whether there is a relevant document at or above K.
What is Success@K?
This is a "massaged" version of TF-IDF?
What is BM25?
What is Self-Ask?
This is arguably the first proposal for adversarial testing.
What is the Turing Test?
This model began the tradition of naming models after characters from the Muppets.
What is ELMo?
This Transformer architecture combines an MLM-based generator with a discriminator designed to distinguish original tokens from corrupted ones.
What is ELECTRA?
This metric gives the percentage of documents at or above K that are relevant.
What is precision?
The BM25 scoring function includes a term that penalizes documents for having this property.
What is being long?
What is "TL;DR"?
This principle says that the meaning of a complex phrase is a function of the meaning of its parts and how they are combined.
What is compositionality?
This famous AI researcher declared that "a significant advance" on core problems in AI could be made over the course of a summer at Dartmouth in 1956.
Who is John McCarthy?
This Transformer architecture combined an autoregressive objective with bidirectional context via permutation orders of the input sequences.
What is XLNet?
This metric aggregates over the precision scores for each K such that there is a relevant document at K.
What is average precision?
This neural retrieval model scores queries with respect to documents based on the output [CLS] representation of each of these texts.
What is DPR?
In this method, we sample a variety of reasoning paths and chose the answer that is most probable after marginalizing out these reasoning paths.
What is Self-Consistency?
What is Adversarial NLI?
This famous NLU system from 1970 allowed the user to give commands to a virtual robot operating in a blocks world.
What is SHRDLU?
This Transformer separately attends to positional embeddings and token embeddings.
What is DeBERTa?
This metric sums over real-valued (graded) relevance scores at each point where there is such a score.
What is Discounted Cumulative Gain?
This neural retrieval model scores documents with respect to queries based on MaxSim computations between every query term and every document term.
What is ColBERT?
This model is currently at the top of the HELM leaderboard.
What is Cohere Command beta?
This diagnostic technique uses a modest amount of fine-tuning on challenge data to differentiate model limitations from dataset limitations.
What is inoculation by fine-tuning?
This is the primary achievement of Eugene Goostman.
What is passing the Turing Test by imitating (being?) an unpleasant 13-year old boy?
This Transformer leads the effort to do battle with the evil forces of the Decepticons.
Who is Optimus Prime?
This is the metric you would favor if it was crucial to have every relevant document above K.
What is recall?
This neural retrieval model represents queries and documents as sparse vectors over the entire vocabulary.
What is SPLADE?
This person is known as the world's first prompt engineer.
Who is Riley Goodside?
The famous adversarial testing paper Jia and Liang 2017 is focused on this benchmark task.
What is SQuAD?