aiFundamentalsMachineLearningChapter4Desc

lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

aiFundamentalsMachineLearningChapter4Title

aiFundamentalsMachineLearningChapter1Desc

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

lesson16Title

lesson17Title

lesson18Title

lesson19Title

lesson20Title

lesson21Title

lesson22Title

lesson23Title

aiFundamentalsMachineLearningChapter1Title

aiFundamentalsMachineLearningChapter2Desc

aiFundamentalsMachineLearningChapter2Title

aiFundamentalsMachineLearningChapter3Desc

lesson24Title

aiFundamentalsMachineLearningChapter3Title

# Predicting with Multiple Decision Trees Using Random Forest 

`Random Forest` is a machine learning algorithm that combines the outputs of multiple decision trees to produce more accurate and reliable predictions.

Using a single decision tree might lead to overfitting to certain data, but Random Forest enhances generalization performance by **combining multiple trees**. 

For instance, a weather prediction model can have multiple decision trees, each learning different patterns, and combine their predictions to make the final weather forecast.

```plaintext title="Example of Predictions by Random Forest"
Decision Tree 1: Probability of rain tomorrow 60%
Decision Tree 2: Probability of rain tomorrow 70%
Decision Tree 3: Probability of rain tomorrow 65%

Final Prediction (Random Forest): Predicts a 65% probability of rain on average
```

In this way, Random Forest is a type of `Ensemble Learning` technique, which improves prediction performance by combining several models. 

> `Ensemble Learning` refers to machine learning techniques where multiple models are combined to achieve better performance than a single model.

 

## Learning Method of Random Forest 

Random Forest is trained through the following process. 

 

### 1. Data Sampling 

Create multiple training datasets (Subsets) by randomly selecting from the original data. 

This ensures that each decision tree learns different data, enhancing the diversity of the model. 

 

### 2. Training Multiple Decision Trees 

Each data sample is used to independently train multiple decision trees. 

During this process, each tree learns only part of the original data and determines the split criteria using `randomly selected features`.

 

### 3. Combining Predictions 

When new data is input, all trees individually make predictions, and the final prediction is determined as follows:

- *For classification problems*: Determine the final class using Majority Voting.

- *For regression problems*: Calculate the average of predictions made by multiple trees to determine the final prediction. 

This results in more stable and generalized predictions compared to a single decision tree. 

 

## Advantages and Limitations of Random Forest 

Random Forest is a powerful and robust machine learning algorithm that helps prevent overfitting while maintaining strong predictive accuracy.

It handles not only `numerical data` but also `categorical data` effectively and reduces the impact of outliers.

Additionally, Random Forest automatically calculates the importance of variables during the learning process, making it easy to understand `variable significance`.

However, training and predicting with multiple decision trees can increase `computational workload`, and the model might be difficult to interpret intuitively.

Also, in real-time prediction scenarios, running multiple trees simultaneously might be disadvantageous speed-wise.

 

So far, we have explored the basic algorithms in machine learning, including linear regression, logistic regression, decision trees, and random forest.

In the next lesson, we'll try a simple quiz based on what we've learned so far.

Random Forests are an ensemble learning method that combines multiple Decision Trees to enhance predictive performance. They offer more stable and generalized predictions compared to a single Decision Tree.

### In general, Random Forests have lower predictive performance than a single Decision Tree.