Representation
Similarity and Distance
KNN Prediction
Exact Search and Cost
Scalable Search and Trade-offs
100

What is a query point?

A new data point whose label or value we want to predict.

100

In KNN, what does “near” mean?

Small distance to the query.

100

What does KNN use to make a prediction?

Nearby stored examples.

100

What does brute-force search check?

Every stored item.

100

What is the goal of approximate search?

To search faster by checking fewer candidates.

200

Why turn data into features or vectors?

So that we can compare them computationally.

200

What does a distance metric measure?

How close or different two data points are.

200

In KNN classification, how is the class usually chosen?

By majority vote among the nearest neighbours.

200

What is the naïve exact search cost per query?

O(nd), where n is stored examples and d is features.

200

What is a candidate set?

A smaller group of promising items checked in detail.

300

What is a feature?

A measurable property used to describe a data point.

300

Why is B closest to q = (2, 2)?

B = (2, 1), so d(q, B) = 1.

300

In KNN regression, how is the value often predicted?

By averaging the values of the nearest neighbours.

300

Why is brute-force search exact?

Because it checks all stored examples before choosing neighbours.

300

What is the intuition behind locality-sensitive hashing?

Similar items are more likely to fall into the same bucket.

400

Why does representation affect KNN?

Because distance, and therefore “near”, depends on the representation.

400

Why did we “look first, calculate second”?

To build visual intuition before applying the distance formula.

400

What does (k) control?

The number of neighbours used for prediction.

400

Why can KNN prediction be expensive at scale?

Because neighbour search happens when the query arrives.

400

What trade-off appears in approximate search?

Faster search may reduce exactness or recall.

500

What can go wrong with a poor representation?

Similar items may look far apart, or dissimilar items may look close together.

500

Why might Euclidean distance fail?

The representation or feature scales may not reflect meaningful similarity.

500

Why can changing (k) change the prediction?

Because it changes which neighbours influence the vote or average.

500

Why is exact search costly for 10⁶ items and 512 features?

One query may examine over 500 million feature dimensions.

500

Why evaluate recall or search quality?

Because the true nearest neighbours may be missed, which can affect the final prediction.

M
e
n
u