Spark
Data Warehouse
Delta Lake Tables
Data Modeling
100

Powerful analytics engine designed for large-scale data processing. It supports multiple programming languages.

What is Apache Spark?

100

This service allows you to run large-scale parallel data queries suitable for big data and enterprise BI needs.

What is Synapse Dedicated Pool

100

They are built on top of the open-source Delta Lake format, which extends Parquet data files with a transaction log for ACID.

What is Delta Tables? 

100

It is the biggest table in the Data Warehouse? 

What is fact table?

200

Collection of resources that you can use to run Apache Spark jobs.

What is Apache Spark Pool?

200

This technology performs coordinated computations in parallel, which significantly speeds up data processing task. 

What is MPP

200

This is the key properties that ensure reliable and consistent data transactions in Delta Lake Tables.

What is ACID?

200

This type of reference represents child-parent relations and is usually used for fact and dimension tables references. 

Where is One to Many relationships? 

300

Interactive tool that allows you to write and execute code for data processing and analysis using Apache Spark

What is Apache Spark Notebook?

300

This operation is fully parallelized, making it the simplest and fastest way to create and insert data into a table with a single command.

What is CTAS?

300

It is feature of Delta Lake which reduces the number of files as they're written. Instead of writing many small files, it writes fewer larger files.

What is OptimizeWrite?

300

In this option dimension tables are normalized, meaning they are broken down into multiple related tables instead of having all the attributes in a single table.

What is snowflake schema? 

400

Two-dimensional, size-mutable tabular data structure organized into named columns, similar to a table.

What is DataFrame?

400

It is a concept in data management that refers to a dimension that stores data generally stable but may change over time, often in an unpredictable manner.

What is slowly changing dimension? 

400

This feature allows you to access past data by specifying a timestamp or a version number.

What is Time Travel in Delta Lake tables?

400

This technology insures in the accuracy, consistency, and reliability of data throughout its lifecycle.  

What is data integrity?

500

Cluster management technology used in Apache Hadoop. It is responsible for resource management and job scheduling, allowing multiple data processing.

What is YARN (Yet Another Resource Negotiator) ? 

500

This function will assign each row to a specific location based on values in a designated column.

What is Hash distribution?

500

This command enables you to remove old parquet data files. After executing this command, you can't time travel back earlier than the retention period.

What is VACUM command?

500

This technology is used in database design to organize data to reduce redundancy and improve data integrity.  

What is Normalization?