What is a data warehouse?
A data warehouse is a collection of data created to support decision-making. It provides an integrated, subject-oriented, time-variant, and nonvolatile repository for data analysis
What is an operational data store (ODS)?
An ODS consolidates data from multiple source systems and provides a near real-time, integrated view of volatile, current data. Unlike a data warehouse, it does not maintain historical data.
What is the primary difference between a data warehouse and a data mart?
a) A data warehouse is subject-oriented, while a data mart is process-oriented
b) A data mart is a subset of a data warehouse focused on a specific area
c) Data marts contain real-time data, whereas data warehouses do not
d) Data marts support all business areas, while data warehouses are focused on a single subject
Correct Answer:
b) A data mart is a subset of a data warehouse focused on a specific area
⇒ Explanation: A data mart is designed for a specific department or function, such as marketing or finance, while a data warehouse serves the entire organization
Which of the following is NOT a characteristic of a data warehouse?
a) Non-volatile
b) Integrated
c) Real-time updating
d) Subject-oriented
Correct Answer:
c) Real-time updating
⇒ Explanation: Data warehouses do not provide real-time updates; instead, they periodically refresh data from source systems. Real-time updating is a characteristic of an operational data store (ODS)
What is the ETL process, and why is each step important?
Extract: Data is collected from multiple source systems, which may have different formats and structures. This step ensures that all necessary data is gathered for further processing.
Transform: The extracted data is cleansed, formatted, and standardized. This may involve removing duplicates, correcting inconsistencies, and integrating data from different sources. Transformation ensures data quality and usability.
Load: The transformed data is stored in the data warehouse or data mart. Loading can be done in bulk (batch processing) or incrementally (real-time updates). This step makes the data available for analysis and decision-making
What is Online Analytical Processing (OLAP), and how does it differ from Online Transaction Processing (OLTP)?
OLAP (Online Analytical Processing):
- Designed for complex queries, reporting, and data analysis.
- Uses historical and aggregated data from data warehouses.
- Supports multidimensional analysis, such as "slicing and dicing" and drill-down functionality.
- Examples: Business intelligence tools for trend analysis, forecasting, and decision-making.
OLTP (Online Transaction Processing):
- Optimized for fast, real-time transaction processing (e.g., order entry, banking transactions).
- Uses normalized databases to ensure data integrity.
- Deals with current, operational data rather than historical trends.
- Examples: Banking systems, e-commerce platforms, and airline reservations