lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalyticsAdvancedChapter4Title

pythonDataAnalyticsAdvancedChapter1Title

pythonDataAnalyticsAdvancedChapter2Title

pythonDataAnalyticsAdvancedChapter3Title

# Dataset Structure: Features and Labels

<br/>

In machine learning, a dataset is typically organized into:

- **Features (X)** – The input variables used by the model to make predictions.  
  Examples: age, height, number of purchases.
- **Labels (y)** – The target variable the model is trying to predict.  
  Examples: whether an email is spam, the price of a house.

A model learns the relationship between **features** and **labels** in supervised learning.

<br/>

## Loading a Dataset in Scikit-learn

Scikit-learn provides built-in datasets for practice. One of the most famous is the **Iris** dataset.

```python title="Loading the Iris Dataset"
from sklearn.datasets import load_iris

iris = load_iris()

# Features (X) - shape: (samples, features)
X = iris.data
print("Feature shape:", X.shape)
print("First row of features:", X[0])

# Labels (y) - shape: (samples,)
y = iris.target
print("Label shape:", y.shape)
print("First label:", y[0])
```

---

## Inspecting Feature and Label Names

```python title="Feature and Label Names"
print("Feature names:", iris.feature_names)
print("Target names:", iris.target_names)
```

---

## Why This Matters

* **Features** are the information your model uses to make predictions.
* **Labels** define the correct answers during training.
* Organizing data correctly into **X** and **y** is essential for Scikit-learn functions like `train_test_split()` and `.fit()`.

---

## Key Takeaways

* **X** → input features, 2D array shape `(n_samples, n_features)`.
* **y** → target labels, 1D array shape `(n_samples,)`.
* Proper separation of features and labels is the first step in preparing data for training.

<br/>

## What’s Next?

In the next lesson, we’ll learn how to **split data into training and testing sets** to evaluate model performance.

In machine learning, features are the independent variables that provide input data to the model. They are crucial for model training as they contain the information needed to make accurate predictions. Properly identifying and organizing features is essential for effective data preprocessing and model building.

Dataset Structure: Features and Labels

Loading a Dataset in Scikit-learn

Inspecting Feature and Label Names

Why This Matters

Key Takeaways

What’s Next?

Understanding Dataset Structure