Week 3 Review

Neural Net Concepts

Parameters

Word Embeddings Concepts

Predictions

Whole Team Questions

100

This part of a neural network receives the initial data.

What is the input layer?

100

In a neural network, we use these as our parameters.

What are weights and biases?

100

In word embeddings, a word becomes mapped to a coordinate, which in other words is a ...

What is a vector?

100

The output of a neural network is often called this.

What is a prediction?

100

This function returns the max between 0 and the input.

What is:

def relu(x):

return max(0, x)

200

This function takes an input and clamps the negative inputs to zero while leaving the other inputs the same.

What is the ReLU activation function?

200

In this phase, parameters are learned.

What is training?

200

Word embeddings help to capture this kind of relationship between two words...

What is similarity?

200

In classification problems, the network predicts a value between 0 and 1 using this function.

What is softmax?

200

Calculate the cosine similarity of these three vectors, then calculate the probabilities that vector B or C comes after vector A. Then, return the most probable word.

A = "Amira" = [2, 7]

B = "Paige" = [1, 7]

C = "Aisha" = [4, 8]

Mult two vectors:

Amira^T Paige = 2 + 49 = 51

Amira^T Aisha = 8 + 56 = 64

Magnitude:

|Amira| = sqrt(53)

|Paige| = sqrt(50)

|Aisha| = sqrt(80)

Cosine Similarity

Amira to Paige: (51)/(sqrt(53) * sqrt(50)) = 0.991

Amira to Aisha: (61)/(sqrt(53) * sqrt(80)) = 0.936

Softmax Probabilities

Paige Next = softmax(0.991) = 0.5137

Aisha Next = softmax(0.936) = 0.4863

So, we should predict Paige as our next word.

300

This function helps introduce linearity to our neural network.

What is an activation function?

300

This value adjusts how much ATTENTION we pay to the gradient.

What is a learning rate?

300

In word embeddings, words that are similar in meaning are usually ____ in space?

What is close/nearby/nearest?

300

The difference between a network's prediction and the actual value is called this.

What is error/loss?

300

This function returns a linear combination of all weights and biases with respect to the input.

What is:

def linear_1D(x, w, b):

return w*x + b

400

A model that performs well on training data, but poorly on new data is doing this.

What is overfitting?

400

More neurons and parameters usually means a more powerful model, but increases the risk of what?

What is overfitting the training data?

400

Word embeddings are trained using this.

What is a corpus/vocabulary/data?

400

This is the opposite of error.

What is accuracy?

400

What value does the following function return?

inputs = [2, 1]

weights = [0.5, -0.5]

bias = 0.1

output = 0

for i in range(len(inputs)):

input, weight = inputs[i], weights[0]

output += input * weight

output += bias

return output

What is 0.6?

500

These things are the main parts of a neural network...

What is an input layer, linear function, activation function, and output layer?

500

In a fully connected neural net layer with 4 inputs and 4 neurons, how many parameters are there?

What is 20?

500

In word embeddings, we group words based on this kind of similarity.

What is cosine similarity?

500

This is a method we use to test how good our prediction is.

What is MSE, RMSE, Cross-Entropy loss, any kind of error we discussed in class?

500

This function returns the closest word in the vocabulary to the input word.

What is:

def nearest_neighbor(input, vocab):

best_dist = float('inf)

best_point = None

best_index = None

for i in range(vocab_size):

next_point = U[i]

diff = (prev_point - next_point)

diffSq = diff**2 # (diff[0]**2, diff[1]**2)

sumDiffSq = np.sum(diffSq) # diffSq[0] + diffSq[1]

dist = np.sqrt(sumDiffSq)

if dist < best_dist:

best_dist = dist

best_point = next_point

best_index = i

best_token = vocab[best_index]

best_token