Lecture

Categorical Data Encoding

AI and machine learning models can only understand numbers.

However, much of the data we work with is text-based.

This kind of data, which can be grouped into certain categories without numerical meaning, is called categorical data.

Example of Categorical Data
| ID | Color | Region | Occupation | |-----|-------|--------|------------| | 1 | Red | New York | Student | | 2 | Blue | Chicago | Employee | | 3 | Green | Los Angeles | Student | | 4 | Yellow| New York | Doctor |

In the data above, color, region, and occupation are categorical data.

These cannot be directly calculated as numbers, and comparing their magnitude or order is not meaningful.

Categorical data can be divided into two main types.


Nominal Data

This is categorical data without any order. Colors (red, blue, green), regions (New York, Chicago, Los Angeles) are examples of nominal data.


Ordinal Data

This is categorical data with an order. Education level (elementary, middle, high school), customer satisfaction (low, medium, high) are examples of ordinal data.

Categorical data needs to be converted into numerical form for machine learning, a process known as encoding.


What is Data Encoding?

Categorical data must be transformed into numbers so that machine learning models can comprehend it. This transformation process is known as data encoding.

For example, let's convert the color data above into numbers.

Color Data Encoding
| ID | Color | Color (Encoded) | |-----|--------|----------------| | 1 | Red | 0 | | 2 | Blue | 1 | | 3 | Green | 2 | | 4 | Yellow | 3 |

With this conversion, the model can process the color data as numbers.

There are methods like Label Encoding and One-Hot Encoding for this transformation.

We will discuss each method in more detail in the following lessons.

Mission
0 / 1

What is the process of converting categorical data into numbers called?

Standardization

Normalization

Encoding

Clustering

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help