lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalyticsAdvancedChapter4Title

pythonDataAnalyticsAdvancedChapter1Title

pythonDataAnalyticsAdvancedChapter2Title

pythonDataAnalyticsAdvancedChapter3Title

# Confusion Matrix and Classification Report

When working with *classification models*, accuracy alone isn't always enough to judge performance — especially if your dataset is `imbalanced` (e.g., predicting rare diseases).  

Two useful tools for deeper analysis are:

1. `Confusion Matrix`: A table showing correct and incorrect predictions for each class.  

2. `Classification Report`: Provides `precision`, `recall`, `F1-score`, and `support` for each class.

<br/>

## What are they?

The following are the key metrics provided by the `Confusion Matrix` and `Classification Report`:

- `Confusion Matrix` reveals *where* your model is making mistakes.  
- `Precision` tells you how many predicted positives were correct.  
- `Recall` tells you how many actual positives were correctly identified.  
- `F1-score` balances precision and recall into a single number.  

<br/>

## Example: Confusion Matrix and Classification Report

The following example shows how to use the `Confusion Matrix` and `Classification Report` to evaluate a classification model.

```python title="Confusion Matrix and Report Example"
# Install scikit-learn in Jupyter Lite
import piplite
await piplite.install('scikit-learn')

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train a KNN model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification Report
print("\nClassification Report:\n", classification_report(y_test, y_pred))
```

<br/>

## Key Takeaways

* Use `confusion matrices` to visualize misclassifications.
* Precision and recall help understand performance beyond accuracy.
* The `F1-score` is especially useful for imbalanced datasets.

A confusion matrix provides a detailed breakdown of correct and incorrect predictions across different classes, allowing you to see where errors occur. This is especially useful for understanding model performance beyond overall accuracy, particularly in imbalanced datasets where accuracy can be misleading.

Confusion Matrix and Classification Report

What are they?

Example: Confusion Matrix and Classification Report

Key Takeaways

What is the primary advantage of using a confusion matrix in evaluating a classification model?