Language Contact
Embeddings
Transfer Learning
Information Extraction
Language Resources
100

A word in a language that is adopted from another language.

What is borrowing or a loanword?

100

The name of the most well-known specific method to train global word embeddings.

What is word2vec or specifically skipgram?

100

A type of transfer learning where the parameters of the model are not changed, but the model sees a few examples of the task.

What is few-shot learning?

100

The general method of automatically or manually attaching labels to natural language data.

What is annotation?

100

A resource with one entry per general language word with all its senses, grammatical information, etc. in alphabetical order.

What is a dictionary?

200

The phenomenon where a person switches between two or more languages within the same conversation or even sentence.

What is code-switching?

200

Word embeddings that are available in more than one language, but equivalent words are not in the same vicinity in vector space.

What are multilingual word embeddings?

200

The general name of the method of training a language model that can then be used for transfer learning.

What is pretraining?

200

The name of the task to obtain domain-specific words from natural language text.

What is term extraction?

200

A resource with domain-specific terms, their equivalent terms in other languages, and relations between entries.

What is a terminology?

300

A language variety where languages are mixed but there are no first language speakers of the mix.

What is a pidgin?

300

A vector space in which words across languages are in a similar vicinity if their meaning is similar.

What are crosslingual embeddings?

300

The specific type of pretrained language model variant (architecture) that the Generative Pretrained Transformers (GPTs) use, but not the BERT models.

What is a decoder-only model?

300

The task of extracting multiple types of information on a specific situation from natural language text, e.g. news.

What is event extraction?

300

Collection of a limited set of predefined words, mostly in one language, for specific communication settings.

What is a controlled vocabulary?

400

Content words in the core vocabulary of languages that derive from the same original or ancestral language.

What are cognates?

400

The representation of entities and their relations in vector space.

What are knowledge graph embeddings?

400

The general method of adapting a pretrained language model to a downstream task, adapting all of its parameters.

What is fine-tuning?

400

The type of lexico-semantic relation that links the general category to a more specific instance of this cagetory, e.g. car-vehicle.  

What is hypernymy?

400

A controlled vocabulary in a usually bilingual list of corresponding words.

What is a glossary?

500

A speaker switches the language in a conversation from one sentence to the next, but not within sentences.

What is inter-sentential code-switching?

500

The process of aligning trained, existing monolingual embeddings across languages by means of specifically learned matrices.

What is projection-based alignment?

500

The specific method for adapting a pretrained language model for a downstream task, where only a small proportion of the parameters are adapated.

What is Low-Rank Adaptation (LoRA)?

500

A specific method to artificially create more data by modifying existing data that can be used across languages.

What is data augmentation?

500

A collection of articles providing summaries of knowledge, including historical details, either in general or on a particular field.

What is an encyclopedia?