Powerful analytics engine designed for large-scale data processing. It supports multiple programming languages.
What is Apache Spark?
This service allows you to run large-scale parallel data queries suitable for big data and enterprise BI needs.
What is Synapse Dedicated Pool
They are built on top of the open-source Delta Lake format, which extends Parquet data files with a transaction log for ACID.
What is Delta Tables?
It is the biggest table in the Data Warehouse?
What is fact table?
Collection of resources that you can use to run Apache Spark jobs.
What is Apache Spark Pool?
This technology performs coordinated computations in parallel, which significantly speeds up data processing task.
What is MPP
This is the key properties that ensure reliable and consistent data transactions in Delta Lake Tables.
What is ACID?
This type of reference represents child-parent relations and is usually used for fact and dimension tables references.
Where is One to Many relationships?
Interactive tool that allows you to write and execute code for data processing and analysis using Apache Spark
What is Apache Spark Notebook?
This operation is fully parallelized, making it the simplest and fastest way to create and insert data into a table with a single command.
What is CTAS?
It is feature of Delta Lake which reduces the number of files as they're written. Instead of writing many small files, it writes fewer larger files.
What is OptimizeWrite?
In this option dimension tables are normalized, meaning they are broken down into multiple related tables instead of having all the attributes in a single table.
What is snowflake schema?
Two-dimensional, size-mutable tabular data structure organized into named columns, similar to a table.
What is DataFrame?
It is a concept in data management that refers to a dimension that stores data generally stable but may change over time, often in an unpredictable manner.
What is slowly changing dimension?
This feature allows you to access past data by specifying a timestamp or a version number.
What is Time Travel in Delta Lake tables?
This technology insures in the accuracy, consistency, and reliability of data throughout its lifecycle.
What is data integrity?
Cluster management technology used in Apache Hadoop. It is responsible for resource management and job scheduling, allowing multiple data processing.
What is YARN (Yet Another Resource Negotiator) ?
This function will assign each row to a specific location based on values in a designated column.
What is Hash distribution?
This command enables you to remove old parquet data files. After executing this command, you can't time travel back earlier than the retention period.
What is VACUM command?
This technology is used in database design to organize data to reduce redundancy and improve data integrity.
What is Normalization?