This SQL command is used to add a new row of data to a table.
INSERT
What is a combination of a data lake and a data warehouse called?
Data Lakehouse
In SQL, this clause returns only unique rows from a table, just like Suman's favorite single malt scotch stands out in a world of blended whiskies.
Distinct
Developed by Anthropic, this AI assistant emphasizes helpfulness and safety in responses.
Claude
In a dataset, this refers to an unusual value far from other observations.
Outlier
This is Amazon’s fully managed relational database service.
Amazon RDS
This is the open-source engine that Databricks was originally built around.
Apache Spark
In a relational database, this type of key uniquely identifies each row, like a serial number on a tequila bottle.
Primary key
This is Google DeepMind’s conversational AI system.
Gemini
$800: In machine learning, this type of model finds patterns in unlabeled data.
Unsupervised learning
This is Google Cloud’s serverless data warehouse.
BigQuery
This is the the governance layer for Databricks.
Unity Catalog
In natural language processing, this model “ferments” text into embeddings to understand meaning, like fermenting grains into whiskey.
Also this is an action figure / movie.
Transformer
This AI model by Meta (Facebook) is an open-weight LLM often used for research purposes.
LLaMA
This open-source tool helps data engineers and analysts transform, test, and document data in the warehouse using SQL-based models.
Hint: If you add an "e" somewhere, it's now a card or financial transaction.
dbt
This is Amazon’s NoSQL key-value and document database.
DynamoDB
This is the Databricks feature that automatically creates train/test splits and evaluates models. (Other platforms use it too)
AutoML
In Pandas, this method returns the first n rows of a DataFrame, like the frothy foam layers at the top of a draft-poured beer.
Head()
This Chinese-developed AI model, known for its cheap training for R1, had significant delays in launching its R2 version due to hardware issues with Huawei's Ascend chips.
DeepSeek
This AWS generative AI assistant, launched in 2023, helps developers and business users generate content and answer questions from enterprise data.
Amazon Q
Amazon Neptune is this kind of serverless database.
Graph database
This open-source platform, created by Databricks, helps manage the entire machine learning lifecycle, including tracking experiments, packaging models, and deploying them.
MLflow
This “brew” of open-source tools helps you mix layers and neurons into a smooth deep learning model, widely used for AI “cocktails.”
TensorFlow
Developed by Alibaba, this AI model series, including the 2.5-Max version, is recognized for its strong performance in coding tasks and multilingual understanding
Qwen
In geospatial data, this type of data stores information in a grid of pixels, like satellite images or digital photos.
Raster