Linear Algebra

Linear

Algebra

Linear

Algebra

Definitions

100

What is the norm of a vector?

The Norm or Magnitude of a vector is the length of the vector. Since a vector is basically a line, if you treat it as the hypotenuse of a triangle you can use the pythagorean theorem to find the equation for the norm of a vector. We're essentially just generalizing the equation for the hypotenuse of a triangle that results from the pythagorean theorem to n dimensional space.

100

What is the cross product?

The Cross Product is the vector equivalent of multiplication. The result is a third vector that is perpendicular to the first two vectors.

100

What is the dot product?

The dot product of two vectors 𝑎⃗ and 𝑏⃗ is a scalar quantity that is equal to the sum of pair-wise products of the components of vectors a and b. An example will make this make much more sense:

100

In order for two Matrices to be equal the following conditions must be true:

1) They must have the same dimensions.

2) Corresponding elements must be equal.

100

What is a transposed matrix in terms of an original matrix's columns and rows?

A transposed matrix is one whose rows are the columns of the original and whose columns are the rows of the original.

200

What is an identity matrix?

A diagonal matrix with ones on the main diagonal and zeroes everywhere else. The product of the any square matrix and the identity matrix is the original square matrix 𝐴𝐼==𝐴. Also, any matrix multiplied by its inverse will give the identity matrix as its product. 𝐴𝐴−1=𝐼

200

What is an inverse matrix?

The inverse is like the reciprocal of the matrix that was used to generate it. Just like 18 is the reciprocal of 8, 𝐴−1 acts like the reciprocal of 𝐴.

200

What happens if we multiply a matrix by its inverse?

The product of a matrix multiplied by its inverse is the identity matrix of the same dimensions as the original matrix. There is no concept of "matrix division" in linear algebra, but multiplying a matrix by its inverse is very similar since 8×18=1.

𝐴−1𝐴=𝐼

200

What is the variance?

Variance is a measure of the spread of numbers in a dataset. Variance is the average of the squared differences from the mean.

𝑋⎯⎯⎯⎯⎯ is the mean of the dataset.

𝑁 is the total number of observations.

𝑣 or variance is sometimes denoted by a lowercase v or 𝜎2.

𝑣=∑(𝑋𝑖−𝑋⎯⎯⎯⎯⎯)2𝑁

200

What is the covariance?

Covariance is a measure of how changes in one variable are associated with changes in a second variable.

300

What is a variance-covariance matrix?

A matrix that compares each variable with every other variable in a dataset and returns the variance values along the main diagonal, and covariance values everywhere else.

300

What is the correlation coefficient?

The measure of spread of a variable is the Standard Deviation. If we divide our covariance values by the product of the standard deviations of the two variables, we'll end up with what's called the Correlation Coefficient.

Correlation Coefficients have a fixed range from -1 to +1 with 0 representing no linear relationship between the data.

In most use cases the correlation coefficient is an improvement over measures of covariance because:

Covariance can take on practically any number while a correlation is limited: -1 to +1.
Because of it’s numerical limitations, correlation is more useful for determining how strong the relationship is between the two variables.
Correlation does not have units. Covariance always has units
Correlation isn’t affected by changes in the center (i.e. mean) or scale of the variables

300

What is orthogonality?

Orthogonality is another word for "perpendicularity". Two vectors or matrices that are perpendicular to one another are orthogonal.

How to tell if two vectors are orthogonal

Two vectors are orthogonal to each other if their dot product is zero.

300

What is a unit vector?

Unit Vectors

A unit vector is any vector of "unit length" (1). You can turn any non-zero vector into a unit vector by dividing it by its norm (length/magnitude). Unit vectors are denoted by the hat symbol "^" above the variable.

300

What is the span?

The span is the set of all possible vectors that can be created with a linear combination of vectors.

400

What is a linearly dependent vector?

Two or more vectors that live on the same line are linearly dependent. This means that there is no linear combination that will create a vector that lies outside of that line. In this case, the span of these vectors is the line that they lie on.

400

What is a linearly independent vector?

Linearly independent vectors are vectors that don't lie on the same line as each other. If two vectors are linearly independent, then there ought to be some linear combination of them that could represent any vector in the space.

400

What is a basis?

The basis of a vector space 𝑉 is a set of vectors that are linearly independent and that span the vector space 𝑉. A set of vectors spans a space if their linear combinations fill the space.

Orthogonal Basis

An orthogonal basis is a set of vectors that are linearly independent, span the vector space, and are orthogonal to each other.

Orthonormal Basis

An orthonormal basis is a set of vectors that are linearly independent, span the vector space, are orthogonal to eachother and each have unit length.

The unit vectors form an orthonormal basis for whatever vector space that they are spanning.

400

What is a rank?

The rank of a matrix is the dimension of the vector space spanned by its columns. Just because a matrix has a certain number of rows or columns (dimensionality) doesn't neccessarily mean that it will span that dimensional space. Sometimes there exists a sort of redundancy within the rows/columns of a matrix (linear dependence) that becomes apparent when we reduce a matrix to row-echelon form via Gaussian Elimination.

400

What is Gaussian elimination?

Gaussian Elimination is a process that seeks to take any given matrix and reduce it down to what is called "Row-Echelon form." A matrix is in Row-Echelon form when it has a 1 as its leading entry (furthest left) in each row, and zeroes at every position below that main entry. These matrices will usually wind up as a sort of upper-triangular matrix (not necessarly square) with ones on the main diagonal.

Gaussian Elimination takes a matrix and converts it to row-echelon form by doing combinations of three different row operations:

1) You can swap any two rows

2) You can multiply entire rows by scalars

3) You can add/subtract rows from each other

500

What is an eigenvector?

Any vector that doesn't change its orientation during a given transformation. An eigenvector may still be scaled by a scalar.

500

What is an eigenvalue?

The scalar that represents how a corresponding eigenvector was scaled during a transformation. Eigenvectors and eigenvalues always come in pairs.

500

What is PCA?

A feature extraction technique that transforms a high dimensional dataset into a new lower dimensional dataset while preserving the maximum amount of information from the original data.

PCA Process

Separate the data into X (features) and y (target) variables
Center each column at 0 by subtracting its mean
Divide each column by its standard deviation to get Z
Calculate the variance-covariance matrix of Z
Calculate the eigenvectors and eigenvalues of the variance-covariance matrix
Sort the eigenvalue and eigenvector pairs
Use a matrix transformation to project datapoints onto our eigenvector subspace

500

What is the curse of dimensionality?

A term that is used to refer to some of the challenges and limitations that arise from trying to process or model datasets with a large number of features (often hundreds or thousands). When the dimensionality increases, the volume of the space increases so fast that the available data become sparse, requiring more data to determine statistical signifigance or find relationships, while increasing the computational load.

500

What is clustering?

The assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning.

K-Means Clustering - An algorithm to find groups in the data, with the number of groups represented by the variable K, with the center of each group being the mean. Selecting the number of groups can be done visually through the use of an elbow graph and looking for where the slope decreases the most.