What is a query point?
A new data point whose label or value we want to predict.
In KNN, what does “near” mean?
Small distance to the query.
What does KNN use to make a prediction?
Nearby stored examples.
What does brute-force search check?
Every stored item.
What is the goal of approximate search?
To search faster by checking fewer candidates.
Why turn data into features or vectors?
So that we can compare them computationally.
What does a distance metric measure?
How close or different two data points are.
In KNN classification, how is the class usually chosen?
By majority vote among the nearest neighbours.
What is the naïve exact search cost per query?
O(nd), where n is stored examples and d is features.
What is a candidate set?
A smaller group of promising items checked in detail.
What is a feature?
A measurable property used to describe a data point.
Why is B closest to q = (2, 2)?
B = (2, 1), so d(q, B) = 1.
In KNN regression, how is the value often predicted?
By averaging the values of the nearest neighbours.
Why is brute-force search exact?
Because it checks all stored examples before choosing neighbours.
What is the intuition behind locality-sensitive hashing?
Similar items are more likely to fall into the same bucket.
Why does representation affect KNN?
Because distance, and therefore “near”, depends on the representation.
Why did we “look first, calculate second”?
To build visual intuition before applying the distance formula.
What does (k) control?
The number of neighbours used for prediction.
Why can KNN prediction be expensive at scale?
Because neighbour search happens when the query arrives.
What trade-off appears in approximate search?
Faster search may reduce exactness or recall.
What can go wrong with a poor representation?
Similar items may look far apart, or dissimilar items may look close together.
Why might Euclidean distance fail?
The representation or feature scales may not reflect meaningful similarity.
Why can changing (k) change the prediction?
Because it changes which neighbours influence the vote or average.
Why is exact search costly for 10⁶ items and 512 features?
One query may examine over 500 million feature dimensions.
Why evaluate recall or search quality?
Because the true nearest neighbours may be missed, which can affect the final prediction.