Lecture

Evaluating Regression Models

Regression models are machine learning models used to predict continuous numerical values. The most commonly used metrics for evaluating their performance are:

  • Mean Squared Error (MSE): The average of the squared differences between predicted and actual values (the closer to 0, the better)
  • Coefficient of Determination (R²): Measures how well the model explains the variance in the target variable (the closer to 1.0, the better)

Formula for Mean Squared Error

Mean Squared Error (MSE) is calculated as the average of the squared differences between predicted and actual values:

MSE=1ni=1n(yiyi^)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2

Formula for Coefficient of Determination (R²)

The Coefficient of Determination (R²) represents how well the model explains the variance of the target variable:

R2=1MSEVar(y)R² = 1 - \frac{MSE}{Var(y)}

Regression Example: R² Score

The following example demonstrates how to evaluate a regression model using the R² score:

R² Score Example
from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split # Generate synthetic regression data import numpy as np rng = np.random.RandomState(0) X_reg = 2 * rng.rand(50, 1) y_reg = 4 + 3 * X_reg.ravel() + rng.randn(50) # Split into training and test sets X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42) # Train the model reg = LinearRegression() reg.fit(X_train, y_train) # Make predictions y_pred = reg.predict(X_test) # Evaluate the model r2 = r2_score(y_test, y_pred) print(f"R² score: {r2:.3f}")

Possible values of are:

  • 1.0: Perfect prediction
  • 0: No improvement over predicting the mean
  • Negative: Worse than predicting the mean

Key Takeaways

  • Use classification metrics for categorical outputs and regression metrics for continuous outputs.
  • Regression models are typically evaluated using Mean Squared Error (MSE) and Coefficient of Determination (R²).
Quiz
0 / 1

What is the primary advantage of using a confusion matrix in evaluating a classification model?

It provides the accuracy of the model.

It predicts the future performance of the model.

It reveals where the model is making mistakes for each class.

It generates new datasets for training.

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help