ML
Optimization
Time Series
Misc. Topics
Dr. Polsley, Aditya, and Bethany Trivia
100

What are word embedding/vectorization and it's core benefit?

  • The idea is simple: we want to quantify the meaning of words.

  • Semantically relate to the meaning of and relationship between words

100

What type of optimization (linear or non-linear) problems can genetic algorithms be used for?

BOTH! Although simplex may be more efficient for LP and gradient descent for Non-LP, can still use genetic as long as there is an objective function.

100

What are lag features and how are they used?

Lag features represent past values of a time series used as input features for modeling future values. 

They help capture temporal dependencies in the data.

100

What type of error is telling a pregnant woman she isn’t carrying a baby?


Type II error - a false negative.

100

What sport did Dr. Polsley play in high school?

Golf

200

                      PREDICTED LABEL
             ┌───────────┬──────────┐
             │    NEGATIVE   │    POSITIVE   │
┌────────┼───────────────────┤
│ TRUE       │   55         │       5            │
│ NEGATIVE│               │                     │
├────────┼───────┼───────────┤
│ TRUE       │   10        │      30            │
│ POSITIVE │               │                     │
└─────────┴──────────────────┘

Calculate the Accuracy, Precision, and Recall.

Accuracy = (55 + 30)/(55 + 5 + 30 + 10 ) = 0.85 

Precision = TP / (TP + FP) = (30) / (30 + 5) = 0.857

Recall = TP / (TP + FN) = (30) / (30 + 10) = 0.75

200

Explain what a feasible region for an LP problem represents. What is special about corner points solutions of a feasible region?

The feasible region is the set of all possible solutions that satisfy all constraints of the problem 

Corner points are where the constraint lines intersect in the feasible region

Optimal Solution always lies on a Corner point  

200

Explain the difference between trend lines and seasonality in time series data?

Trend lines

  • Overall ”smoothed” behavior of the data

  • Can be linear or nonlinear


Seasonality

  •  a recurring pattern at regular intervals (e.g., daily, weekly, monthly, yearly).

  • caused by cyclical factors like weather, holidays, or business cycles.


200

Identify the limitations of gradient descent?

- Stuck at local minima

- Function needs to be differentiable

- Step size

200

What was Aditya's weekly screen time last week?

4 hours 7 min

300

What concept is measured with this equation:

(SEE BOARD)

Explain how this concept's significance in the algorithm it is primarily used in.

Entropy.  

It measures the impurity or uncertainty of a dataset, and decision trees use it to determine the most effective splits to make, aiming to maximize information gain and create a more homogeneous dataset with each split. Start with an impure dataset as a whole, with each split reduce entropy subsets of data.

300

Consider the error function for a parameter θ, which guides the optimal step selection during gradient descent.

(SEE BOARD)

What is the gradient in terms of θ for this 1D case?

((2θ - y)^2)*2 

300

Why does changing the unit of measurement (e.g., converting meters to centimeters) affect covariance but not correlation?

Correlation provides a scale-free measure of linear dependence, making it easier to interpret and compare. 

Covariance is not scale-free because its value depends on the units of the two variables involved. If you change the scale (e.g., measuring weight in grams instead of kilograms), the covariance also changes proportionally.

300

Explain the 2 approaches to implement Word2Vec.

  • Continous Bag of Words (CBOW) : inputs surrounding word window to predict the word.

  • Skip-gram : inputs the word and is trained to predict the surrounding word window.

300

What track events did Bethany play in high school?

Javelin and Discus

400

How can boosting be used to improve a model?

Boosting Idea:

  • Let’s train the first model and see how it does.

  • Then, let’s train another model on the data it got wrong!

  • Repeat

  • Essentially, we are training model after model on partitions of the data. This helps to reduce high bias

400

A transport company has two types of trucks, Type A and Type B. Type A has a refrigerated capacity of 20 m3 and a non-refrigerated capacity of 40 m3. Type B has a refrigerated capacity of 30 m3 and a non-refrigerated capacity of 30 m3. A grocer must hire trucks to transport 3000 m3 of refrigerated stock and 4000 m3of non-refrigerated stock. The cost per mile of Type A is $30, and $40 for Type B. How many trucks of each type should the grocer rent to achieve the minimum total cost?

What are 

(1) decision variables

(2) objective function

(3) all constraints

(1)

x = # Type A trucks

y = # Type B trucks

(2)

MINIMIZE f(x,y) = 30x + 40y

(3)

20x + 30y ≥ 3000

40x + 30y ≥ 4000

x ≥ 0

y ≥ 0

400

How can you improve a standard moving average model (without using a different ML model or adding autoregressive components)?

- Use Weighted Moving Averages

400

Come up with a scenario when you would to modify LLM parameters of top p, top k, and temperature.

  • Top-P restricts the random sampler to the top tokens limited by a cumulative probability p.

  • Top-K restricts the random sampling to the top k tokens

  • Temperature is a setting that scales the softmax output

400

What is Aditya's usual sleeping position?

Back Sleeper

500

How is Gini impurity (Gini index) used in decision trees?

The Gini index helps identify the most effective way to split a node, ensuring that each subsequent node is as pure as possible, The feature with the lowest Gini index will be chosen for splitting the node.

500

Argue the most essential functions or subcomponents of a genetic algorithm and their significance.

Can make valid arguments for:

  • Reproduction (selection and crossover)

  • Mutate

  • Calculate fitness

  • Rank and prune

500

You want to know your friend's activity, but you can only see what weather is outside and not see your friend. 

(1) what are the hidden states

(2) what are the observed states

(3) what are the missing values in the HMM transition probability diagram?

(SEE BOARD)

(4) what is the term for the probability of an observation given an state?

(1) hidden: badminton, football, video games

(2) observed: sunny, rainy, windy

(3)

A: 0.3

B: 0.2

C: 0.6

(4) emission probability

500

What is a situation you would use a SARIMA model over an ARIMA model?

What is a situation you would use a ARIMA model over an SARIMA model?

When to use ARIMA?

  • When the time series does not have seasonality but has trends or irregular fluctuations.

  • Example: Stock prices, GDP, sales trends over time (without seasonality).


When to use SARIMA?

  • When the time series has seasonal patterns (e.g., sales peaking every holiday season or electricity usage fluctuating by time of year).

  • Example: Retail sales, temperature trends, airline passenger counts.


500

What was the class that Dr. Polsley lost his 4.0 GPA in?

Stats

M
e
n
u