Corpus Types
Corpus Analysis
Vocabulary & Grammar
Tools & Applications
Fun Facts
100

A corpus containing transcriptions of spoken language.

What is a spoken corpus?

100

The process of breaking text into words or sentences

What is tokenization?

100

Words that often appear together (e.g., "fast food").

 What are collocations?

100

Software to analyze word frequency (e.g., AntConc).

What is a concordancer?

100

The first computerized corpus (1961, Brown University).

What is the Brown Corpus?

200

A corpus that includes multiple languages for translation studies.

What is a parallel corpus?

200

The method of reducing words to their base form (e.g., "running" → "run").

What is lemmatization?

200

A word with multiple meanings (e.g., "bat").

A word with multiple meanings (e.g., "bat").What is polysemy?

200

Corpora are widely used to improve this type of AI-based language translation.

What is machine translation?

200

The linguist who criticized corpora for ignoring "possible" sentences.

Who is Noam Chomsky?

300

A collection of old texts used to study language change over time.

What is a historical corpus?

300

The tool used to find words in context within a corpus.

What is a concordancer?

300

The tendency of "cause" to pair with negative words ("trouble").

 What is semantic prosody?

300

Corpus data helps in building these digital tools that define word meanings.

What are dictionaries?

300

This corpus includes conversational English and is often used for studying spoken language.

What is the Corpus of Contemporary American English?

400

A corpus built using student writing to analyze common mistakes.

What is a learner corpus?

400

A measure of how often a word appears in a corpus.

 What is word frequency?

400

Fixed phrases like "on the other hand."

What are lexical bundles?

400

Using corpora to update dictionaries.


What is lexicography?

400

The father of modern corpus linguistics (created COBUILD).

Who is John Sinclair?

500

 A corpus updated regularly to track new language trends (e.g., COCA).

What is a monitor corpus?

500

Tagging words with meanings (e.g., "bank" = financial institution vs. river).

What is semantic annotation? 

500

Studying how words interact with grammar (e.g., "make a decision").

What is lexico-grammar?

500

Corpora are used to train AI models for this type of technology that lets computers understand and generate human language.

What is natural language processing (NLP)?

500

 The organization that distributes corpora for NLP research.

What is the Linguistic Data Consortium (LDC)

M
e
n
u