Lecture

Confusion Matrix and Classification Report


When working with classification models, accuracy alone isn’t always enough to judge performance — especially if your dataset is imbalanced (e.g., predicting rare diseases).
Two useful tools for deeper analysis are:

  1. Confusion Matrix – A table showing correct and incorrect predictions for each class.
  2. Classification Report – Provides precision, recall, F1-score, and support for each class.

Why Use Them?

  • Confusion Matrix reveals where your model is making mistakes.
  • Precision tells you how many predicted positives were correct.
  • Recall tells you how many actual positives were correctly identified.
  • F1-score balances precision and recall into a single number.

Example

Confusion Matrix and Report Example
# Install scikit-learn in Jupyter Lite import piplite await piplite.install('scikit-learn') from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import confusion_matrix, classification_report # Load dataset iris = load_iris() X, y = iris.data, iris.target # Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42, stratify=y ) # Train a KNN model model = KNeighborsClassifier(n_neighbors=3) model.fit(X_train, y_train) # Predictions y_pred = model.predict(X_test) # Confusion Matrix cm = confusion_matrix(y_test, y_pred) print("Confusion Matrix:\n", cm) # Classification Report print("\nClassification Report:\n", classification_report(y_test, y_pred))

Key Takeaways

  • Use confusion matrices to visualize misclassifications.
  • Precision and recall help understand performance beyond accuracy.
  • The F1-score is especially useful for imbalanced datasets.

What’s Next?

In the next lesson, we’ll introduce K-Means clustering as our first unsupervised learning algorithm.

Quiz
0 / 1

What is the primary advantage of using a confusion matrix in evaluating a classification model?

It provides the accuracy of the model.

It predicts the future performance of the model.

It reveals where the model is making mistakes for each class.

It generates new datasets for training.

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help