Introduction to Clustering (K-Means)
Clustering is an unsupervised learning method where the goal is to group similar data points into clusters without using labels.
One of the most popular algorithms for clustering is K-Means.
How K-Means Works
- Choose k – the number of clusters.
- Initialize k cluster centers randomly.
- Assign points to the nearest center.
- Update centers to be the mean of their assigned points.
- Repeat steps 3–4 until the cluster assignments stop changing.
K-Means tries to minimize the distance between points in the same cluster and their cluster center.
When to Use K-Means
- You want to group data by similarity without predefined labels.
- Your dataset has numerical features and a moderate number of dimensions.
- You suspect there are clear groups in the data.
Example: Clustering Iris Data
K-Means Example
# Install scikit-learn in Jupyter Lite import piplite await piplite.install('scikit-learn') from sklearn.datasets import load_iris from sklearn.cluster import KMeans import matplotlib.pyplot as plt # Load data (only first two features for visualization) iris = load_iris() X = iris.data[:, :2] # Apply K-Means kmeans = KMeans(n_clusters=3, random_state=42) labels = kmeans.fit_predict(X) # Plot clusters plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k') plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red', marker='X', label='Centers') plt.xlabel(iris.feature_names[0]) plt.ylabel(iris.feature_names[1]) plt.title("K-Means Clustering (Iris)") plt.legend() plt.show()
Key Takeaways
- Unsupervised learning means no labels are provided.
- K-Means groups data into k clusters by minimizing within-cluster distances.
- Choosing the right value of k is critical — often done via the elbow method.
What’s Next?
In the next lesson, we’ll look at Model Selection and Cross-Validation to ensure our models generalize well to unseen data.
Quiz
0 / 1
K-Means clustering requires labeled data to group similar data points.
True
False
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help