Basic concepts
Architecture and development approaches
ETL processes
Data stores and metadata
100

What are the four key characteristics of a data warehouse as defined by Inmon (1992)?

    1. Subject Oriented

    2. Integrated 

    3. Time Variant 

    4. Nonvolatile 

100

What are the main components of a data warehouse architecture?

    1. Data Sources 

    2. ETL (Extract, Transform, Load) Software 

    3. Staging Area 

    4. Data Warehouse 

    5. Data Marts 

    6. Metadata 

    7. Data Access Tools and Applications 

    8. Users

100

What is the purpose of the ETL (Extraction, Transformation, and Loading) process in data warehousing?

The ETL process extracts data from source systems, transforms it for decision-support purposes (including data cleansing, integration, and formatting), and loads it into a target database such as a data warehouse or data marts. ETL ensures data quality, consistency, and integration across multiple sources, making it usable for business intelligence and analytics.

100

What is metadata in the context of data warehousing?

Metadata is “data about data” in a data warehouse. It provides information about data sources, database structures, refresh schedules, report definitions, and query tools.

200

True or false: A dependent data mart is built directly from source systems, without relying on a data warehouse.

False – A dependent data mart is created using data extracted from a data warehouse, not directly from source systems.

200

True or false: Bill Inmon’s enterprise data warehouse approach suggests starting with small, independent data marts and then integrating them later.

False – Bill Inmon advocates for a top-down approach where an enterprise data warehouse is built first, and dependent data marts are created afterward.

200

Custom-written data extraction programs are always preferred over commercial ETL software because they provide more flexibility.

False – While custom extraction programs offer flexibility, they require extensive technical expertise and maintenance. The trend is toward using commercial ETL software, which automates and simplifies data extraction and transformation.

200

Nothing to do! Free points for your team. 

:)

300

Which of the following is NOT a primary function of a data warehouse?

A) Storing historical data for analysis

B) Allowing users to modify transactional data

C) Integrating data from multiple sources

D) Supporting decision-making applications

B) Allowing users to modify transactional data – A data warehouse is nonvolatile, meaning data cannot be changed by users.

300

Which of the following is NOT a key function of ETL (Extract, Transform, Load) software?

A) Extracting data from source systems

B) Transforming and cleansing data

C) Storing long-term historical data for analysis

D) Loading processed data into a data warehouse

C) Storing long-term historical data for analysis – ETL software does not store historical data; it processes and loads data into the data warehouse, where the data is stored for long-term analysis.

300

How much time left before summer: 

- 113 days

- 10 weeks 

- 20 Fridays


113 days

300

Which of the following developments has improved metadata management in data warehousing?

A) The introduction of flat file storage methods

B) The creation of metadata standards like the Common Warehouse Model (CWM)

C) The elimination of metadata from ETL processes

D) The use of proprietary metadata storage by vendors

B) The creation of metadata standards like the Common Warehouse Model (CWM)

Metadata management has improved through standardized models like CWM, which ensure better interoperabilityamong different data warehousing components and vendors.

400

Which of the following best describes an Operational Data Store (ODS)?

A) A repository that maintains historical data for long-term trend analysis.

B) A system that provides near real-time access to integrated current data.

C) A data mart designed specifically for financial analysts.

D) A system that extracts, transforms, and loads data directly from source systems.

B) A system that provides near real-time access to integrated current data. – An ODS consolidates data from multiple sources and provides an up-to-date, integrated view.

400

Which statement best describes the data mart strategy for data warehousing?

A) It focuses on building a central data warehouse first, then creating data marts from it.

B) It starts with small, subject-specific data marts and later integrates them into an enterprise data warehouse.

C) It eliminates the need for an enterprise data warehouse.

D) It requires all data marts to use normalized data structures.

B) It starts with small, subject-specific data marts and later integrates them into an enterprise data warehouse.

The data mart strategy is a bottom-up approach where organizations begin with smaller, focused data marts and later expand them into a full data warehouse.

400

Which of the following best describes the function of data staging in the ETL process?

A) It stores transformed data for long-term historical analysis.

B) It provides temporary storage where extracted data is cleaned, transformed, and processed before loading into the warehouse.

C) It allows end-users to access and analyze raw data before it enters the warehouse.

D) It ensures that the extracted data is immediately available for reporting and querying without modifications.

B) It provides temporary storage where extracted data is cleaned, transformed, and processed before loading into the warehouse.

The staging area acts as a “work-in-process” zone where data is transformed and prepared before being loaded into the warehouse.

400

What is one of the major advantages of using multidimensional database technology in data warehousing?

A) It is cheaper to implement than relational databases.

B) It provides a view of data optimized for analytical queries, such as OLAP operations.

C) It ensures that all data is stored in a normalized format.

D) It eliminates the need for metadata in data analysis.

B) It provides a view of data optimized for analytical queries, such as OLAP operations.

Multidimensional databases store data in cubes and allow faster responses to analytical queries like slicing, dicing, drill-downs, and roll-ups.

500

A retail company wants to analyze real-time customer purchasing behavior during a flash sale event. Would they benefit more from using an operational data store (ODS) or an oper mart? Justify your answer.

They would benefit more from using an oper mart. An ODS provides integrated, near real-time data, but it is not optimized for multidimensional analysis. An oper mart, derived from an ODS, allows for multidimensional analysis, such as tracking sales trends during the event. It provides a structured way to analyze current operational data quickly, making it the better choice for real-time decision-making.

500

How many texts did we had to read for today's session?

3

500

What are the 3 solutions to cleansing dirty data ?


    


  1. Rely on the basic cleansing capabilities of ETL software.

  2. Custom-write data cleansing routines.

  3. Use special-purpose data cleansing software.

500

True or false: The advantage of a multidimensional database is that it provides a view of the data that is tailored to the way that users want to analyze the data.


True

M
e
n
u