Classification with K-Nearest Neighbors
K-Nearest Neighbors (KNN) is one of the simplest machine learning algorithms for classification.
It classifies a data point based on the majority class of its nearest neighbors in the training set.
How KNN Works
- Store the entire training dataset.
- For a new data point:
- Calculate the distance to all training samples (commonly Euclidean distance).
- Select the
k
closest neighbors. - Assign the most common class among those neighbors.
Example: KNN on the Iris Dataset
KNN Classification Example
# Install scikit-learn in Jupyter Lite import piplite await piplite.install('scikit-learn') from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import classification_report, accuracy_score # Load dataset iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Create KNN model knn = KNeighborsClassifier(n_neighbors=3) # Train the model knn.fit(X_train, y_train) # Predict y_pred = knn.predict(X_test) # Evaluate print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") print("\nClassification Report:\n", classification_report(y_test, y_pred))
Choosing the Value of k
- Small
k
→ More flexible, but sensitive to noise. - Large
k
→ Smoother decision boundaries, but may underfit.
A good approach is to try different k-values and choose the one that gives the best validation accuracy.
Key Takeaways
- KNN is non-parametric — no explicit model training beyond storing the data.
- Works well for small datasets, but can be slow for large datasets.
- Scaling features is important for distance-based algorithms.
What’s Next?
In the next lesson, we’ll explore Regression with Linear Models.
Quiz
0 / 1
What does the K-Nearest Neighbors algorithm use to classify a new data point?
The sum of distances from all training samples.
The average value of the data points.
The majority class of its nearest neighbors.
The largest class in the entire dataset.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help