What's the big difference between UNION & UNION ALL?
UNION ALL keeps all of the records from each of the original data sets, UNION removes any duplicate records.
As a Database Administrator, How Do You Prevent Data Loss?
Perform regular data backups
What are the components of an SSIS package?
What is ETL? Describe each step of the process.
EXTRACT - the data from its original source, whether that is another database or an application
TRANSFORM - data by cleaning it up, deduplicating it, combining it
LOAD - the data into the target database
What was the ATH(all time high) at BITCOIN?
69000$
Specify types of relationships in SQL
One-to-One
One-to-Many & Many-to-One
Many-to-Many
What is SQL Agent and what is it used for?
SQL agents enable you to perform commands at a specific time repeatedly.
What is a surrogate key?
A surrogate key is a unique key for an entity in the client's business or for an object in the database.
What is HDFS?
HDFS is an acronym for Hadoop Distributed File System. It is a distributed file system that runs on commodity hardware and can handle massive data collections
At what age did Ceausescu die?
71 years
Difference between temp table & CTE - why & when do you use them?
CTE can only be used in the current query scope and a temporary table can exist for the entire duration of the session.
CTE gives us the best overall performance in terms of duration and resources used.
What are heap tables?
Do you know how to identify the rows in a Heap?(optional)
A heap is a table without a clustered index.
When a table is stored as a heap, individual rows are identified by reference to an 8-byte row identifier (RID) consisting of the file number, data page number, and slot on the page (FileID:PageID:SlotID). The row ID is a small and efficient structure.
What's the difference between Connected & Unconnected Lookup? Which can return multiple columns?
Connected can return multiple columns.
Unconnected returns one output value can be used in an expression transformation
What is a snowflake schema?
Snowflake schema is an extension of a star schema and adds additional dimension tables that split the data up, looking like a snowflake.
(one fact + multiple normalized dimensions)
How many states does America have?
50 states
What means Dirty Read or what is it referring to? Have you an example?
A Dirty read is a situation when a transaction reads data that has not yet been committed.
What is index fragmentation? Describe the difference between rebuilding and reorganizing indexes.
SQL Server index fragmentation is a common source of database performance degradation. Fragmentation occurs when there is a lot of empty space on a data page (internal fragmentation) or when the logical order of pages in the index doesn't match the physical order of pages in the data file (external fragmentation).
An index rebuild simply drops and recreates the index which means that index rebuild will solve both the internal and external fragmentation. An index reorganize only solves external fragmentation by moving pages around.
What is a Hub? What contains a Hub?
Hubs contain a list of unique business keys with a low chance to change.
The Hub contains at least the following fields:
a surrogate key used to connect the other structures to this table.
a business key, the driver for this hub.
What is the difference between a data warehouse and an operational database?
Databases use SQL statements like Delete , Insert, and Update and focus on speed and efficiency, so analyzing data can be more challenging.
In a data warehouses, the primary focus is on calculations, aggregations, and select statements that make it ideal for data analysis.
What's the age of Queen Elizabeth II?
96
Describe the Order of Operations Deadlock and give us some tips to solve it.
This deadlock happens when exclusive locks are used or locks escalate between different processes that need respective resources.
It could be prevented by timing operations differently, tuning queries to run faster, changing the transaction isolation level, and so on.
What are statistics?
Under what circumstances do they go out of date, how do you update them?
SQL Server statistics are essential for the query optimizer to prepare an optimized and cost-effective execution plan. These statistics provide the distribution of column values to the query optimizer, and it helps SQL Server to estimate the number of rows (also known as cardinality).
Statistics are running out of date when massive changes took places.
EXEC sp_updatestats or UPDATE STATISTICS
What is a Link satellite used for? Data Vault context
It is used to check when a specific link was valid. It usually contains the link hash, source, and validity ranges.
What is a data pipeline?
A data pipeline is a system for transporting data from one location (the source) to another (the destination) (such as a data warehouse).
Data is converted and optimized along the journey, and it eventually reaches a state that can be evaluated and used to produce business insights.
The procedures involved in aggregating, organizing, and transporting data are referred to as a data pipeline. Many of the manual tasks needed in
processing and improving continuous data loads are automated by modern data pipelines.
Which is the WR of the most children born?
The greatest officially recorded number of children born to one mother is 69. Her name is unknown, but we know she was the first wife of Feodor Vassilyev