Miscellaneous
LLMs
CI/CD
Data Prep
Visual ML
100

 What city is Dataiku's official corporate headquarters?

Paris, France

100

This term describes models like GPT-3, which are capable of understanding and generating human-like text based on the input they're given

What is an LLM?

100

The branch that is deployed and no development should be done on.

What is the main branch?

100

This process involves combining data from different sources by using a common column or key, enhancing the dataset with more features for analysis.

What is Joining or Merging Data?

100

Where does a user go when wanting to perform VisualML on a Dataset?

The Lab

200

This 5-letter name represented the original CHS data warehouse

What is CHSDW?

200

This process involves feeding a pretrained LLM examples of text inputs and desired outputs during its training phase, teaching it to generate specific types of responses.

What is Fine-Tuning?

200

This file always needs to be updated with your latest workingProjectKey before deploying a new release

What is the "azure-pipeline.yml" file?

200

The preferred computation for most data prep steps within Dataiku

What is Snowflake?

200

In Dataiku's visual machine learning interface, this process automatically transforms raw data into a format that is more suitable and effective for machine learning models.

What is feature engineering?

300

This 3 letter acronym represents the practice of analyzing and interpreting human language

What is Natural Language Processing (NLP)?

300

This is the first step/recipe in creating a Retrieval Augmented Generation (RAG) model.

What is text extraction?

300

This pipeline duplicates your most recent feature project and creates a new project based on the main branch for deployment.

What is the "Project" Azure DevOps pipeline?

300

The metric for determining if two strings match when fuzzy matching

What is "sentence similarity"? 

  • Damerau–Levenshtein

  • Hamming

  • Jaccard 

  • Cosine 

300

Dataiku integrates this type of algorithm to visually assist users in understanding which variables most significantly impact their predictive model's outcomes.

What are Feature Importance Charts or Shapley Values?

400

The original proposer of AI

Who is Alan Turing?

400

This is the process of searching an embedded vector based on input text.

What is similarity search?

400

This pipeline picks up your bundle and publishes it to higher environments

What is the Octopus pipeline?

400

This Dataiku functionality allows users to reduce the dimensionality of their dataset, improving model training times and possibly enhancing model performance.

What is PCA (Principal Component Analysis)?

400

This type of Hyperparameter search strategy makes "smart guesses" based on the result of the previous run.

What is Bayesian search?

500

The original creator of the Linux kernel and Git

Linus Torsvald

500

This advanced technique in LLM training involves periodically saving the state of the model during training so that the model can be reverted to a previous state if the training starts to produce worse results.

What is Checkpointing?

500

This spot within the version number denotes a minor number and refers to a new feature.

What is the second position in a version number? 1.X.0.0 

500

This advanced data prep step in Dataiku uses a specific function to transform skewed data into a more normal distribution, which can significantly improve the performance of machine learning models.

What is Data Normalization or Log Transformation?

500

This advanced technique in Dataiku's visual machine learning toolkit helps users address class imbalance problems in their datasets, improving model performance on minority classes.

What is SMOTE (Synthetic Minority Over-sampling Technique)?