lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalyticsAdvancedChapter4Title

pythonDataAnalyticsAdvancedChapter1Title

pythonDataAnalyticsAdvancedChapter2Title

pythonDataAnalyticsAdvancedChapter3Title

# Confusion Matrix and Classification Report

 

When working with **classification models**, accuracy alone isn’t always enough to judge performance — especially if your dataset is **imbalanced** (e.g., predicting rare diseases). 
Two useful tools for deeper analysis are:

1. **Confusion Matrix** – A table showing correct and incorrect predictions for each class. 
2. **Classification Report** – Provides **precision**, **recall**, **F1-score**, and **support** for each class.

 

## Why Use Them?

- **Confusion Matrix** reveals *where* your model is making mistakes. 
- **Precision** tells you how many predicted positives were correct. 
- **Recall** tells you how many actual positives were correctly identified. 
- **F1-score** balances precision and recall into a single number. 

 

## Example

```python title="Confusion Matrix and Report Example"
# Install scikit-learn in Jupyter Lite
import piplite
await piplite.install('scikit-learn')

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
 X, y, test_size=0.3, random_state=42, stratify=y
)

# Train a KNN model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification Report
print("\nClassification Report:\n", classification_report(y_test, y_pred))
```

 

## Key Takeaways

* Use **confusion matrices** to visualize misclassifications.
* Precision and recall help understand performance beyond accuracy.
* The **F1-score** is especially useful for imbalanced datasets.

 

## What’s Next?

In the next lesson, we’ll introduce **K-Means clustering** as our first unsupervised learning algorithm.

A confusion matrix provides a detailed breakdown of correct and incorrect predictions across different classes, allowing you to see where errors occur. This is especially useful for understanding model performance beyond overall accuracy, particularly in imbalanced datasets where accuracy can be misleading.

Confusion Matrix and Classification Report

Why Use Them?

Example

Key Takeaways

What’s Next?

What is the primary advantage of using a confusion matrix in evaluating a classification model?