This hyperparameter controls how many splits the tree can make.
Smaller values of this creates simpler trees that focus on broad patterns in the data while larger values allow the tree to capture more complex patterns and interactions.
What is max_depth (maximum depth of tree)
This hypothesis test is used to compare the means of two independent groups.
What is a t-test.
Can you give me a specific example showcasing that personalization works?
Any successful campaign results!
This metric is used to assess the tradeoff between precision and recall, providing a single score that balances both. The harmonic mean of the two.
What is the F1 Score
SELECT COUNT(*) as num_members,
home_bedroom_count
FROM model_run_20250306.std_members_enhanced
GROUP BY 2
This is the most commonly missed check for developers.
This hyperparameter controls the minimum amount of data / weight required to make a split in a tree for the child nodes.
If the value is too low the tree can make splits even on small subsets of data. If it is higher the tree only makes where there is a significant amount of data supporting it.
What is min_child_weight
This is the term for an error that occurs when a true null hypothesis is rejected.
The probability of making this error is denoted by alpha (significance level).
What is a Type I error (false positive).
Your client asks if you can build them a model for data you just got and haven't gotten a chance to analyze. What is the correct response?
A. Of course! We'll build you an amazing model right away that has perfect accuracy.
B. Let us take a look at a baseline model to see what insights we can gather and assess if its feasible
C. Let us take a look at the data and see what trends and relationships we can see between data points that will be insightful in predicting XYZ. Once we establish some insights, we can build a baseline model and assess if it has predictive power.
What is C
C. Let us take a look at the data and see what trends and relationships we can see between data points that will be insightful in predicting XYZ. Once we establish some insights, we can build a baseline model and assess if it has predictive power.
This supervised algorithm predicts a data points class or value based on the classes or values of the data points closest to it. It is a simple yet effective method for low-dimensional datasets.
What is k-nearest neighbors (KNN)
for order in orders:
if (orderStatus = "Paid"):
orderStatus = "Shipped"
print(Order {order_id} is now Shipped.")
else:
print("Order {order.order_id} is now Completed.")
What is variable naming.
order_id vs orderStatus (one is camel case while another is snake case)
This hyperparameter controls the fraction of training data used to grow each tree.
Helps to increase robustness of model.
This statistical test is used to compare the means of more than two independent groups.
What is ANOVA (analysis of variance)
Your client has received your Faraday client export but is sus on the accuracy of the fields. They ask how you validated Faraday information.
Check means, medians, and outliers
Compare against a similar client population
Compare against census data
This supervised learning algorithm is widely used for classification and regression tasks by finding the best decision boundary that maximizes the margin between classes.
What is Support Vector Machine (SVM)
public double CalculateTax(double income)
{
if (income <= 50000)
return income * 0.10; // 10% tax for income <= 50,000
if (income <= 100000)
return income * 0.15; // 15% tax for income between 50,001 and 100,000
if (income <= 200000)
return income * 0.20; // 20% tax for income between 100,001 and 200,000
return income * 0.30; // 30% tax for income > 200,000
}
What is magic number code.
Makes future updates cumbersome, increases risk of errors. Remove magic numbers entirely and centralize these values in a configuration file, database, or any central place, so they can be easily updated without modifying the code
This hyperparameter performs a type of regularization where it adds a penalty to the square of the leaf weights.
This helps to discourage overly large weights, smoothing the values and help to reduce overfitting and make the model more balanced.
What is lambda (L2 Regularization)
This statistical test is used to compare variance across multiple groups and it is a prerequisite for performing an ANOVA test.
What is the Bartlett's test
FREEBIE YOU JUST GOT SOME POINTS
YAY
This type of neural network layer is used to automatically learn spatial hierarchies in image data by applying kernel mathematics through a matrix over input data.
What is a convolutional layer
public class DiscountCalculator
{
public double CalculateDiscount(double amount, double discountPercentage)
{
double discount = amount * discountPercentage;
double discountedPrice = amount - discount;
return discountedPrice;
}
public double ApplyDiscount(double amount, double discountPercentage)
{
double discount = amount * discountPercentage;
double discountedPrice = amount - discount;
return discountedPrice;
}
}
This does not follow DRY (Don't Repeat Yourself).
The same logic is being used in multiple places. It means if there is a change in logic we have to manually change it everywhere.
Separate the logic into its function and this function can be used everywhere without having redundant code.
This hyperparameter performs a type of regularization that adds a penalty to the loss function based on the absolute magnitude of leaf weights encouraging some of the weights to become 0.
This leads to a sparser model where fewer features are effectively used in the splits.
What is alpha (L1 Regularization)
This principle stats that as the sample size increases, the sampling distribution of the sample mean becomes more normally distributed regardless of the shape of the original population distribution.
What is the Central Limit Theorem
You just presented personas to your client one of which was the Busy Household Persona and provided recommendations for benefits.
One benefit you recommended was a family wide wellness program where parents and children could schedule appointments at the same time.
Your client asks "Why is benefit beneficial for this persona and what analyses have you done in the past to prove this?"
The caretakers number of office visits, allowed amounts decrease as the child's office visits, allowed amounts increase.
ED Visits for parents are increased after having a child.
This model can be used to generate new data, such as images or text, by learning a generative distribution, and it consists of two networks: a generator and a discriminator
What is a Generative Adversarial Network (GAN)
def get_user_info():
name = input("Enter your name: ")
email = input("Enter your email: ")
return name, email
def send_welcome_email():
name, email = get_user_info() # Directly calling get_user_info, tightly coupling them
print(f"Sending welcome email to {name} at {email}")
# Calling the tightly coupled function
send_welcome_email()
What is tightly coupled code.
def get_user_info():
name = input("Enter your name: ")
email = input("Enter your email: ")
return name, email
def send_welcome_email(name, email):
print(f"Sending welcome email to {name} at {email}")
# Now, calling the functions in a decoupled way
name, email = get_user_info() # Get user info separately
send_welcome_email(name, email) # Pass the data to send_welcome_email