Lecture

Dataset Structure: Features and Labels

In machine learning, a dataset is typically organized into:

  • Features (X): The input variables used by the model to make predictions. For example, age, height, or number of purchases.
  • Labels (y): The target variable that the model is trying to predict. For example, whether an email is spam or the price of a house.

A model learns the relationship between features and labels in supervised learning.


Loading a Dataset in Scikit-learn

Scikit-learn provides built-in datasets for practice. One of the most famous is the Iris dataset.

Loading the Iris Dataset
from sklearn.datasets import load_iris iris = load_iris() # Features (X) - shape: (samples, features) X = iris.data print("Feature shape:", X.shape) print("First row of features:", X[0]) # Labels (y) - shape: (samples,) y = iris.target print("Label shape:", y.shape) print("First label:", y[0])

Inspecting Feature and Label Names

You can inspect the feature and label names of the Iris dataset using the following code:

Feature and Label Names
print("Feature names:", iris.feature_names) print("Target names:", iris.target_names)

The following are some key points about features and labels:

  • Features are the information your model uses to make predictions.

  • Labels define the correct answers during training.

  • X: input features, 2D array shape (n_samples, n_features).

  • y: target labels, 1D array shape (n_samples,).

  • Organizing data correctly into X and y is essential for Scikit-learn functions like train_test_split() and .fit().

  • Proper separation of features and labels is the first step in preparing data for training.

Quiz
0 / 1

Understanding Dataset Structure

In a dataset used for machine learning, the input variables are referred to as .
Features
Labels
Targets
Outputs

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help