According to the study guide’s description of distance-based outliers, why does a point with a large k-NN distance often indicate an outlier?
Because it resides far from dense regions of the data, showing sparse local structure.
Apriori generates candidates using what principle?
Level-wise search and anti-monotonicity
What does this scatter plot suggest about the strength and direction of correlation, and what visual feature supports your conclusion?
A moderate positive correlation; supported by points trending upward with visible but not tight alignment.
According to the study guide, what quantity does k-Means minimize?
Within-cluster sum of squared distances to the centroid.
What is an outlier, using the study guide’s definition?
A point that deviates significantly from the expected pattern or distribution.
How does increasing the value of k affect the sensitivity of k-NN outlier detection?
Larger k smooths local variation and makes outliers harder to detect because neighborhoods become broader.
A frequent itemset must meet which requirement?
Support ≥ minimum support
According to box-plot interpretation rules in the study guide, what does the longer upper whisker indicate about the distribution?

Right (positive) skew; the upper tail contains more dispersed values.
Why does k-Means perform poorly on datasets with clusters of different densities or non-spherical shapes?
It assumes convex, similarly shaped clusters based on Euclidean distance.
In DBSCAN, why is ε important for outlier detection?
It defines the neighborhood radius used to determine if a point has enough neighbors to be a core point.
What major challenge does k-NN face when detecting outliers in datasets with uneven density?
Points in naturally sparse regions may be incorrectly flagged as outliers even when they belong to legitimate low-density clusters.
Define closed itemset, and why is it valuable for analysis.
A closed itemset has no superset with the same support; it preserves all support information without redundancy.
How would the presence of this outlier affect interpretation of the overall correlation, based solely on study-guide concepts?
It weakens the perceived trend by pulling the correlation away from the main pattern.
How does the study guide define a closed itemset, and why is it valuable for analysis?
A closed itemset has no superset with the same support; it preserves all support information without redundancy.
Why might a border point be considered "non-noise" even if it does not satisfy the core condition?
Because it lies within the ε-neighborhood of a core point.
Why is k-NN sensitive to feature scaling when used for outlier detection?
Because distance calculations dominate the method; features with larger numeric ranges disproportionately affect distance.
What distinguishes a maximal frequent itemset from a merely closed itemset?
A maximal itemset has no frequent supersets; closed itemsets may still have frequent supersets but with different support
Using the study guide’s definitions of central tendency and dispersion, what can be concluded about the relationship between these two groups?
They differ in central tendency (different medians) but show similar variability (similar IQRs).
What distinguishes a maximal frequent itemset from a merely closed itemset?
A maximal itemset has no frequent supersets; closed itemsets may still have frequent supersets but with different support.
What outlier-detection challenge is highlighted by the study guide when using box-plots?


Outliers may appear in varying degrees, and box-plots do not reveal whether outliers arise from meaningful structure or noise.
What is a fundamental difference between DBSCAN and k-NN for outlier detection, based on the study guide?
DBSCAN uses density thresholds, while k-NN uses relative distances; thus DBSCAN can adapt to shape, while k-NN cannot distinguish natural sparse structure from anomalies.
Two itemsets have identical support, but one contains more items. What does this imply about possible closure or generator relationships?
The larger itemset may be closed; the smaller may be a generator with the same support but fewer items.
The study guide lists “cluster tendency” as a key evaluative concept. What does this scatter plot suggest about cluster tendency?
It demonstrates strong cluster tendency because the points naturally form two separable groups.
Two itemsets have identical support, but one contains more items. What does this imply about possible closure or generator relationships?
The larger itemset may be closed; the smaller may be a generator with the same support but fewer items.
Why does DBSCAN outperform distance-based methods in detecting outliers within datasets containing clusters of varying shapes?
Because it identifies outliers based on density, not global distance thresholds.