Data Architecture
AI Concepts in Data Systems
Performance and Scaling
SQL and Databases
Machine Learning
100

This centralized repository is designed to store large amounts of structured data from multiple sources for reporting and analysis.

What is a Data Warehouse?

100

AI model type that generates text, images, or code

What is generative AI?

100

Technique used to store frequently accessed data for faster retrieval

What is caching?

100

This SQL clause is used to filter records based on a specific condition.

What is WHERE?

100

This type of learning uses labeled data to train models to predict outcomes or classify data.

What is Supervised Learning?

200

This is a decentralized architectural approach that treats data as a product and shifts ownership from a central team to the specific business domains that actually create and use it.

What is a data mesh?

200

This is a system that retrieves relevant documents before generating an answer

What is retrieval-augmented generation (RAG)?

200

Adding more machines to handle increased load

What is horizontal scaling?

200

This type of join returns all records when there is a match in either the left or the right table.

What is a FULL OUTER JOIN?

200

Despite its name containing the word "regression," this algorithm is actually used to predict the probability of a categorical "yes" or "no" outcome.

What is Logistic Regression?

300

Defines how data is moved and transformed across systems

What is a data pipeline?

300

This term refers to the total amount of information (tokens) an AI model can "remember" or process at a single time during a conversation.

What is the Context Window?

300

Increasing resources on a single machine

What is vertical scaling?

300

This is a specific column in a table that must contain unique values and cannot be empty, ensuring that every single row can be identified.

What is a Primary Key?

300

This term describes a model that performs well on training data but fails to generalize to new, unseen data.

What is Overfitting?

400

This discipline creates a single, consistent "golden record" for core business entities like "Customer" or "Product" across an entire organization.

What is Master Data Management (MDM)?

400

This is an open standard that establishes a universal, secure interface for AI models to connect with diverse data sources and tools, replacing the need for fragmented, custom-built integrations.

What is Model Context Protocol (MCP)?

400

Delay between data generation and availability for use

What is data latency?

400

This is a prepared SQL code block that you can save and reuse, allowing you to execute complex logic with a single call rather than rewriting the script.

What is a Stored Procedure?

400

This algorithm classifies a new data point by finding a specific number of points closest to it and taking a "majority vote" from those neighbors.

What is K-Nearest Neighbors (KNN)?

500

Often used in microservices, this pattern involves treating every change to data as an immutable event in a sequence rather than just updating a final state.

What is Event Sourcing?

500

This specialized storage system is designed to index and search high-dimensional embeddings efficiently for AI applications.

What is a vector database?

500

This scaling technique involves breaking a large database into smaller, faster, more easily managed parts called "chunks" that are spread across multiple servers.

What is Sharding?

500

This class of SQL functions, which includes RANK() and SUM() OVER(), performs calculations across a set of rows related to the current row.

What is a Window Function?

500

This ensemble method builds multiple decision trees and merges them together to get a more accurate and stable prediction.

What is Random Forest?

M
e
n
u