Big Data & Data Mining Jeopardy Template

Column 1

Column 2

Column 3

Column 4

100

An organized grouping of information within a specific structure

What is a database?

100

A classification model applied to existing “labeled” data (i.e., where the category or class is known). E.g., grouping patients based on a known medical history or condition.

What is classification or categorization?

100

A unique value assigned to observations

What is an identifier?

100

The discipline of extracting useful and actionable insight from data to improve the process and outcome of decision making used to be the role of this person.

What is the statistician?

200

A computational process of analyzing datasets using both statistical and logical methods in order to uncover hidden, previously unknown and interesting patterns that can inform organizational decision making

What is data mining?

200

What is a relational database?

200

Information that is not organized or easily interpreted by traditional databases or data models, and is typically text-heavy.

What is unstructured data?

200

The characteristics that in combination comprise Big data.

What are the 5 V's? Volume, Velocity, Variety, Veracity and Value.

300

A system that is NOT efficient for analysis (requiring accessing multiple data sources) requiring query multiple databases at one time through joins.

What is OLTP or online transactional processing?

300

The intentional combination of tables which
reduces the number of joins necessary to
query related data, making it efficient for analysis.

What is OLAP or online analytical processing?

300

This type of data is always composed of discrete values.

What is categorical?

300

The process of data mining.

What is CRISP-DM: Cross Industry Standard for Data Mining?

400

The other TWO terms for variables, fields, columns,
characteristics.

What are attributes and features?

400

A large, denormalized and archived database.

What is a data warehouse?

400

The phase of a Data Mining project where you: Select the data (2) Cleanse the data; (3) Construct New Data; (4) Integrate the data and (5) Format the data.

What is the Data Preparation Phase or Phase 3?

500

The number of observations (rows) and the number of variables (columns).

What are dimensions?

500

Tabular data that can be stored in SQL databases in rows and columns (e.g., transactional data)

What is structured data?

500

The random assignment of observations
to three independent datasets to ensure each is representative of the whole.

What is partitioning?

500

This is known as the abstraction (or computerized representation) or a real-world phenomenon.

What is a model?