lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalysisAdvancedChapter4Title

pythonDataAnalysisAdvancedChapter1Title

pythonDataAnalysisAdvancedChapter2Title

pythonDataAnalysisAdvancedChapter3Title

# Introduction to Clustering (K-Means)

`Clustering` is an `unsupervised learning` method where the goal is to group similar data points into clusters `without using labels`.

One of the most popular algorithms for clustering is `K-Means`.

<br/>

## How `K-Means` Works

The following are the steps of `K-Means`:

1. `Choose k`: the number of clusters.
2. `Initialize`: k cluster centers randomly.
3. `Assign points`: to the nearest center.
4. `Update centers`: to be the mean of their assigned points.
5. `Repeat`: steps 3–4 until the cluster assignments stop changing.

> `K-Means` tries to **minimize the distance** between points in the same cluster and their cluster center.

<br/>

## When to Use K-Means

The following are the conditions for using `K-Means`:

- You want to group data by similarity without predefined labels.
- Your dataset has numerical features and a moderate number of dimensions.
- You suspect there are clear groups in the data.

<br/>

## Example: Clustering Iris Data

The following example shows how to use `K-Means` to cluster the Iris dataset.

```python title="K-Means Example"
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Load data (only first two features for visualization)
iris = load_iris()
X = iris.data[:, :2]

# Apply K-Means
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

# Plot clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
            s=200, c='red', marker='X', label='Centers')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title("K-Means Clustering (Iris)")
plt.legend()
plt.show()
```

<br/>

## Key Takeaways

* `Unsupervised learning` means no labels are provided during training.
* `K-Means` groups data into `k` clusters by minimizing distances within each cluster.
* Choosing the right value of `k` is essential — commonly done using the `elbow method`.

K-Means clustering is an unsupervised learning method, which means it does not rely on labeled data. Instead, it groups similar data points based on their features, attempting to minimize the distance between points within the same cluster and their cluster center. This process allows it to identify natural groupings in the data without any predefined labels.

### K-Means clustering requires labeled data to group similar data points.

Introduction to Clustering (K-Means)

How K-Means Works

When to Use K-Means

Example: Clustering Iris Data

Key Takeaways

K-Means clustering requires labeled data to group similar data points.

How `K-Means` Works