To Join or Not to Join
She's a Databricks
House
Potent Potables
(Data Edition)
Whose AI Model is it Anyway?
Data Potpourri
100

This SQL command is used to add a new row of data to a table.

INSERT

100

What is a combination of a data lake and a data warehouse called?

Data Lakehouse

100

In SQL, this clause returns only unique rows from a table, just like Suman's favorite single malt scotch stands out in a world of blended whiskies.

Distinct

100

Developed by Anthropic, this AI assistant emphasizes helpfulness and safety in responses.

Claude

100

In a dataset, this refers to an unusual value far from other observations.

Outlier

200

This is Amazon’s fully managed relational database service.

Amazon RDS

200

This is the open-source engine that Databricks was originally built around.

Apache Spark

200

In a relational database, this type of key uniquely identifies each row, like a serial number on a tequila bottle.

Primary key

200

This is Google DeepMind’s conversational AI system.

Gemini

200

$800: In machine learning, this type of model finds patterns in unlabeled data.

Unsupervised learning

300

This is Google Cloud’s serverless data warehouse. 

BigQuery

300

This is the the governance layer for Databricks. 

Unity Catalog

300

In natural language processing, this model “ferments” text into embeddings to understand meaning, like fermenting grains into whiskey.

Also this is an action figure / movie.

Transformer

300

This AI model by Meta (Facebook) is an open-weight LLM often used for research purposes.

LLaMA

300

This open-source tool helps data engineers and analysts transform, test, and document data in the warehouse using SQL-based models.

Hint: If you add an "e" somewhere, it's now a card or financial transaction.

dbt

400

This is Amazon’s NoSQL key-value and document database.

DynamoDB

400

This is the Databricks feature that automatically creates train/test splits and evaluates models. (Other platforms use it too)

AutoML

400

In Pandas, this method returns the first n rows of a DataFrame, like the frothy foam layers at the top of a draft-poured beer.

Head()

400

This Chinese-developed AI model, known for its cheap training for R1, had significant delays in launching its R2 version due to hardware issues with Huawei's Ascend chips.

DeepSeek

400

This AWS generative AI assistant, launched in 2023, helps developers and business users generate content and answer questions from enterprise data.

Amazon Q

500

Amazon Neptune is this kind of serverless database.

Graph database

500

This open-source platform, created by Databricks, helps manage the entire machine learning lifecycle, including tracking experiments, packaging models, and deploying them.

MLflow

500

This “brew” of open-source tools helps you mix layers and neurons into a smooth deep learning model, widely used for AI “cocktails.”

TensorFlow

500

Developed by Alibaba, this AI model series, including the 2.5-Max version, is recognized for its strong performance in coding tasks and multilingual understanding

Qwen

500

In geospatial data, this type of data stores information in a grid of pixels, like satellite images or digital photos.

Raster

M
e
n
u