Complexity
Vectors, Matrices, and Systems of Linear Equations
Curve Fitting
Interpolation
What could go wrong?
100

Complexity analysis looks at the ____ scenario.

a.) best-case

b.) worst-case

b.) worst-case

100

True or False. Any two matrices can be multiplied together.

False.

100

How does curve fitting differ from interpolation?

Curve fitting finds a single global solution. It finds the 'best fit' to all of our data. Also, we need to specify the mathematical model to use.

Interpolation uses a pre-defined mathematical model and works locally. It creates many piece-wise solutions.

100

Unlike a linear fit, this type of interpolation ensures the path between points is smooth and allows for curvature.  

Cubic spline.

100

Your manager says your code is too slow. You know that Python allows you to change the precision used to store variables. 

To speed up your code, you switch from the most precise to the least precise. Your O(n2) algorithm now runs much faster, and you're ready to deploy it in a bridge stress application.

What could go wrong? 

We're using less storage space for our variables, which means the range of numbers we can reliably store in the computer is limited.

We will likely encounter compounding numerical errors that will impact the accuracy of our bridge model.

200

We have timed how long programs take to run and used this as a proxy for complexity. Ideally, one would want to count the number of operations needed as a function of the input data size.

Why? What's problematic about timing that makes it less reliable?

Timing is hardware dependent. We can get very different results when the test is done on different computers.

Number of operations is the same regardless of where it's conducted.

200

A system of linear equations can be written in matrix-vector form as Mx = v. What is the Python library and function that solves an equation of this form?

Numpy. Specifically, numpy.linalg.solve()

200

Describe what overfitting and underfitting are in the context of curve fitting.

Underfitting - we chose a mathematical model that is too simplistic for our data. It doesn't fit the trend very well.

Overfitting - we chose a mathematical model that is too complicated for our data. It can be forced to fit our training data, but fails when applied to new data.

200

import scipy

f = scipy.interpolate.interp1d(x, y)

True or False. No additional steps are needed. f contains linearly interpolated results for our input x values.

False. This is the first of two steps. This sets up the equations using our known data points. This step does not do any interpolation. We have not yet specified where we want the interpolation to happen.

200

a = [1,2,3]

b = [4,5,6]

You'd like to find the dot product of these two vectors so you do: 

import numpy as np

np.dot(a,b)

What could go wrong?

We didn't specify a and b as np.array(). As currently written, they are Python lists, and can't be treated as mathematical vectors.

300

I need a sorting algorithm. Alice creates an algorithm that's O(n) and Bob creates an algorithm that's O(n2). 

What does this notation mean? How do I use this notation to compare Alice and Bob's approaches?

This notation relates complexity to known mathematical functions.

Alice's algorithm is linear. Bob's is quadratic.

300
There are three possible outcomes for a system of linear equations. What are they?

1.) A single unique solution. 2.) no solution. 3.) an infinite number of solutions.

300

Describe how curve fitting works in practice. Give an overview what we need to do and how we'd do it in Python.

1.) The human analyst chooses a mathematical function to fit the data to. 2.) We write the mathematical function as a Python function 3.) We call curve_fit with our x and y data. 4.) We use the output of curve_fit to get the model parameters.

300

Hook's Law, F = -kx, says that the stiffness of spring (k) is related to the force on the spring and the displacement.

We have several measurements of F and x from our physics lab. We'd like to find the value of 'x' that best fits our data. Why is interpolation a bad choice?

Interpolation finds multiple piece-wise local solutions. It does not care about the best-fit 'global' solution.

We'd want to use curve fitting here.

300

You achieve an R2 of 0.99 by fitting a 10th-degree polynomial to noisy data points. The best possible R2 value is 1.0 so this model is nearly perfect and ready to deploy in a real-life scenario. 

What could have gone wrong?

A 10th-degree polynomial may have been too complicated for these data. We likely overfit.

400

The complexity of two nested loops is on the oder of...

O(n2)

400

You are running a concession stand at a basketball game. You are selling hot dogs and sodas. Each hot dog costs $1.50 and each soda costs $0.50. At the end of the night you made a total of $78.50. You sold a total of 87 hot dogs and sodas combined. 

What is the matrix-vector form of this system of linear equations?

h + s = 87

1.5h + 0.5s = 78.50

400

The curve_fit function returns two results. Describe what those two results are and how we'd use them.

1.) the parameters of our mathematical model. For example, if we fit the data to a straight line, we'd get the parameters of a line (slope and y-intercept)

2.) The uncertainty of each parameter. We get back the variance. Standard deviation is commonly used in practice so we'd need to take the square root.

400

True or False. Linear and cublic spline are the only two options when conducting interpolation.

False. They are the two most popular, but they are not the only options.

400

The matrix in our system of linear equations has a determinant of 10-18. Therefore, there's a single unique solution to our system of linear equations.

What could have gone wrong?

A value of zero indicates no solution or an infinite number of solutions. The small deviation from zero in this application is likely due to round-off errors in the computer. Had we solved the problem by hand, we likely would have gotten exactly zero.

500

An engineer is modeling heat transfer using a 2D grid. Doubling the resolution (adding more grid cells) increases the total number of points (N) by 4x. 

If this new grid is O(N3), the computation time for the new, higher-resolution model will increase by this specific factor.

43 = 64x increase

500

A superconductor is a real system that can be built in a lab. However, in our circuit lab, we saw that superconductors may lead to misleading results in a computer simulation. Why?

Superconductors have extremely small, but not zero, resistance. These very small numbers cannot be stored reliably in a computer and may be misinterpreted as zero.

500

Our curve fitting produced an R2 value of 0.999. The best Rvalue we can get is 1. Why might this still result in a failed engineering model when applied to data not used in the curve fitting?

The mathematical model used was too complicated. This is called overfitting. 

500

Using self-driving cars as an example, discuss 1.) how interpolation could be used to get from waypoint A to waypoint B and 2.) what issues might we run into if the waypoints are very close together

1.) Cubic spline interpolation can be used to generate a smooth driving path from point A to point B.

2.) A similar issue as the superconductor in the circuit lab. Cubic spline uses system of linear equations to do the interpolation. Very small numbers may be incorrectly interpreted as zero leading to errors in our path planning.

500

n

import numpy as np

a = np.array( [ [1,2,3],

                       [4,5,6] ] )

b = np.array( [ [4,5,6 ],

                      [7,8,9]] )

np.matmul(a,b)

What could go wrong?

The dimensions do not align. Matrix multiplication is not possible.