What city is Dataiku's official corporate headquarters?
Paris, France
This term describes models like GPT-3, which are capable of understanding and generating human-like text based on the input they're given
What is an LLM?
The branch that is deployed and no development should be done on.
What is the main branch?
This process involves combining data from different sources by using a common column or key, enhancing the dataset with more features for analysis.
What is Joining or Merging Data?
Where does a user go when wanting to perform VisualML on a Dataset?
The Lab
This 5-letter name represented the original CHS data warehouse
What is CHSDW?
This process involves feeding a pretrained LLM examples of text inputs and desired outputs during its training phase, teaching it to generate specific types of responses.
What is Fine-Tuning?
This file always needs to be updated with your latest workingProjectKey before deploying a new release
What is the "azure-pipeline.yml" file?
The preferred computation for most data prep steps within Dataiku
What is Snowflake?
In Dataiku's visual machine learning interface, this process automatically transforms raw data into a format that is more suitable and effective for machine learning models.
What is feature engineering?
This 3 letter acronym represents the practice of analyzing and interpreting human language
What is Natural Language Processing (NLP)?
This is the first step/recipe in creating a Retrieval Augmented Generation (RAG) model.
What is text extraction?
This pipeline duplicates your most recent feature project and creates a new project based on the main branch for deployment.
What is the "Project" Azure DevOps pipeline?
The metric for determining if two strings match when fuzzy matching
What is "sentence similarity"?
Damerau–Levenshtein
Hamming
Jaccard
Cosine
Dataiku integrates this type of algorithm to visually assist users in understanding which variables most significantly impact their predictive model's outcomes.
What are Feature Importance Charts or Shapley Values?
The original proposer of AI
Who is Alan Turing?
This is the process of searching an embedded vector based on input text.
What is similarity search?
This pipeline picks up your bundle and publishes it to higher environments
What is the Octopus pipeline?
This Dataiku functionality allows users to reduce the dimensionality of their dataset, improving model training times and possibly enhancing model performance.
What is PCA (Principal Component Analysis)?
This type of Hyperparameter search strategy makes "smart guesses" based on the result of the previous run.
What is Bayesian search?
The original creator of the Linux kernel and Git
Linus Torsvald
This advanced technique in LLM training involves periodically saving the state of the model during training so that the model can be reverted to a previous state if the training starts to produce worse results.
What is Checkpointing?
This spot within the version number denotes a minor number and refers to a new feature.
What is the second position in a version number? 1.X.0.0
This advanced data prep step in Dataiku uses a specific function to transform skewed data into a more normal distribution, which can significantly improve the performance of machine learning models.
What is Data Normalization or Log Transformation?
This advanced technique in Dataiku's visual machine learning toolkit helps users address class imbalance problems in their datasets, improving model performance on minority classes.
What is SMOTE (Synthetic Minority Over-sampling Technique)?