The process of replacing missing data with substituted values is called:
imputation
What is a common metric to evaluate linear regression models (describes how much variance is captured by the input variables)?
R^2
What are the three main components of optimization formulations?
Describe one reason why integer programs are harder to solve than linear programs.
Discrete feasible region, complete enumeration
In Mohammad's work with DHS and Immigration Policy, what did he notice about some of the detainer IDs in the dataset?
There were many non-matching IDs in the dataset, and found during data cleaning that he had to multiply them by 2 and add three.
Fill in the blanks: In analytics, we use _____ to build _____ to make _______.
Describe one reasonable method of modeling text data.
One-hot encoding, Bag-of-Words, TF-IDF, Trained Embeddings (e.g., GloVe)
Consider the following optimization formulation. The decision variables are x1 and x2. Is it linear?
max 1/3*x1 + pi*x2
s.t. sin(y1)*x1 + cos(y2)*x2 <= cos(y1)*x1 + sin(y2)*x2
(y1/y2)*x1 = x2
x1, x2 >= 0
Yes
A class of non-linear programs where we can guarantee global optimality is known as:
convex optimization
Other than "linear programming", the World Food Programme application for humanitarian food aid discussed in lecture is an example of what class of optimization problems?
Network Optimization
Describe the difference between training sets and testing sets.
Training set is used to build the model and fit parameters.
Testing set is used to evaluate the model on unseen data.
Name three metrics we can use to evaluate the predictive performance of models.
MAD, MSE, MAPE
What is the term for the set of possible solutions that abide by all the constraints in an optimization formulation?
Feasible region
Describe one reason why non-linear problems are difficult to solve.
Optimum does not necessarily lie in a corner of the feasible region
No certificate of optimality (local vs. global)
TNG and the Airline Industry were examples of Revenue Management. In Revenue Management, what are some of the main ideas to consider (hint: there are three)?
Selling the right product to the right customers at the right price and the right time
- Customer Segmentation
- Pricing
- Holding Sales for High-Demand Periods
In practice, the data we work with is imperfect and requires cleaning. Name three methods for data cleaning.
KNN, mean, median, arbitrary value, etc.
What are the three components of ARIMA models?
AR - autoregressive (P), MA - moving average (Q), I - integrated (difference, D)
Define shadow price.
For a given constraint, change to the objective function if I increase the corresponding RHS by 1 unit; in other words, the "marginal value" of that constraint.
What are two approaches to multi-objective programming we discussed in lecture? Briefly describe both.
Weight-based Approach (assign weights to each objective)
Goal Programming (single objective, other objectives as constraints)
In the MAMD case, we wanted to write constraints that modeled non-linear scenarios such as "If have enough raw milk available then I must produce a minimum amount of butter; otherwise, I can not produce any butter at all."
What linearization technique did we employ to model such constraints?
Big-M
Name the three major classes of analytics.
Descriptive, Predictive, and Prescriptive
What is one key assumption we make when conducting time series analysis?
The data is stationary
What is the algorithm used to efficiently solve linear programs called? Briefly describe how it works.
Simplex Method; traverse edges of the feasible region and evaluate objective at each extreme point
What is the term to describe candidate solutions in multi-objective programming where both objectives cannot be improved simultaneously?
Pareto-Optimal
In the ICBC China Bank case, what was the biggest challenge in estimating market potential, and how did they address that challenge?
Market potential can not be concretely quantified, so they didn't have data they could train a typical regression model with. Instead, they created an optimization problem to estimate the weights by minimizing the difference in predicted market potential value between similar cells (with regularization) subject to the fact that the calculated market potentials must satisfy the relative rankings that the experts suggested.