This centralized repository is designed to store large amounts of structured data from multiple sources for reporting and analysis.
What is a Data Warehouse?
AI model type that generates text, images, or code
What is generative AI?
Technique used to store frequently accessed data for faster retrieval
What is caching?
This SQL clause is used to filter records based on a specific condition.
What is WHERE?
This type of learning uses labeled data to train models to predict outcomes or classify data.
What is Supervised Learning?
This is a decentralized architectural approach that treats data as a product and shifts ownership from a central team to the specific business domains that actually create and use it.
What is a data mesh?
This is a system that retrieves relevant documents before generating an answer
What is retrieval-augmented generation (RAG)?
Adding more machines to handle increased load
What is horizontal scaling?
This type of join returns all records when there is a match in either the left or the right table.
What is a FULL OUTER JOIN?
Despite its name containing the word "regression," this algorithm is actually used to predict the probability of a categorical "yes" or "no" outcome.
What is Logistic Regression?
Defines how data is moved and transformed across systems
What is a data pipeline?
This term refers to the total amount of information (tokens) an AI model can "remember" or process at a single time during a conversation.
What is the Context Window?
Increasing resources on a single machine
What is vertical scaling?
This is a specific column in a table that must contain unique values and cannot be empty, ensuring that every single row can be identified.
What is a Primary Key?
This term describes a model that performs well on training data but fails to generalize to new, unseen data.
What is Overfitting?
This discipline creates a single, consistent "golden record" for core business entities like "Customer" or "Product" across an entire organization.
What is Master Data Management (MDM)?
This is an open standard that establishes a universal, secure interface for AI models to connect with diverse data sources and tools, replacing the need for fragmented, custom-built integrations.
What is Model Context Protocol (MCP)?
Delay between data generation and availability for use
What is data latency?
This is a prepared SQL code block that you can save and reuse, allowing you to execute complex logic with a single call rather than rewriting the script.
What is a Stored Procedure?
This algorithm classifies a new data point by finding a specific number of points closest to it and taking a "majority vote" from those neighbors.
What is K-Nearest Neighbors (KNN)?
Often used in microservices, this pattern involves treating every change to data as an immutable event in a sequence rather than just updating a final state.
What is Event Sourcing?
This specialized storage system is designed to index and search high-dimensional embeddings efficiently for AI applications.
What is a vector database?
This scaling technique involves breaking a large database into smaller, faster, more easily managed parts called "chunks" that are spread across multiple servers.
What is Sharding?
This class of SQL functions, which includes RANK() and SUM() OVER(), performs calculations across a set of rows related to the current row.
What is a Window Function?
This ensemble method builds multiple decision trees and merges them together to get a more accurate and stable prediction.
What is Random Forest?