lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalysisAdvancedChapter4Title

pythonDataAnalysisAdvancedChapter1Title

pythonDataAnalysisAdvancedChapter2Title

pythonDataAnalysisAdvancedChapter3Title

# What is `Cross-Validation`?

`Cross-validation` is a model evaluation technique that tests how well a model generalizes to unseen data.
Instead of using a single train-test split, cross-validation divides the dataset into multiple **folds**, training and testing the model several times on different subsets.

In `k-fold cross-validation`:

1. The data is divided into *k* folds.
2. For each fold:
   - Train the model on *k-1* folds.
   - Test it on the remaining fold.
3. Average the results to get a more reliable performance estimate.

<br/>

## Common Cross-Validation Types

- `K-Fold Cross-Validation`: Most common, splits into *k* equal folds.
- `Stratified K-Fold`: Maintains class proportions in each fold (important for classification).
- `Leave-One-Out (LOO)`: Each observation is tested individually.
- `ShuffleSplit`: Random splits with replacement.

<br/>

## Example: Comparing Models with Cross-Validation

In this example, both models are evaluated using `5-fold cross-validation`, and the one with the higher average accuracy is considered better.

```python title="Cross-Validation Example"
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Define models
log_reg = LogisticRegression(max_iter=200)
knn = KNeighborsClassifier(n_neighbors=5)

# Cross-validation
log_scores = cross_val_score(log_reg, X, y, cv=5)
knn_scores = cross_val_score(knn, X, y, cv=5)

print(f"Logistic Regression mean score: {log_scores.mean():.3f}")
print(f"KNN mean score: {knn_scores.mean():.3f}")
```

> This example uses `5-fold cross-validation` to compare two models and select the one with the highest average accuracy.

<br/>

## Key Takeaways

* Model selection ensures the chosen model is the best fit for both accuracy and efficiency.
* Cross-validation gives a *more robust estimate* of real-world performance.
* Always use the *same cross-validation strategy* when comparing models to ensure fairness.

Cross-validation is primarily used to evaluate how a model will perform on unseen data. By splitting the dataset into multiple subsets and running the model across different combinations, it provides a more accurate estimate of the model's effectiveness compared to just training and testing on a single split. This method is crucial to ensure that the model generalizes well and does not overfit the training data.

### What is the primary purpose of using cross-validation in model selection?

What is Cross-Validation?

Common Cross-Validation Types

Example: Comparing Models with Cross-Validation

Key Takeaways

What is the primary purpose of using cross-validation in model selection?

What is `Cross-Validation`?