lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

aiPromptEngineeringBasicsChapter1Title

aiPromptEngineeringBasicsChapter2Title

aiPromptEngineeringBasicsChapter3Title

# How Generative AI Works in 4 Steps: AI as a Function

At its core, *AI* operates as a **function** with multiple inputs and a wide range of possible outputs.

However, unlike the simple functions in math, such as `f(x) = x + 2`, AI is not a straightforward equation. 

AI is an incredibly complex function that requires vast amounts of data and computational power to develop.

> For example, `GPT-3`, released in 2020, was trained on approximately *570GB* of text data.

To put that into perspective, a 300-page book is roughly 1MB in size. This means that 570GB is equivalent to around *580,000 books* worth of text. 

If it takes about **6 hours** to read one book, it would take roughly **397 years** to read through the training data of GPT-3.

 

## What Does Training AI Mean?

Training AI means optimizing its **parameters**—numerical values that help AI make accurate predictions.

Through training, AI learns **patterns** from its input data and uses these learned parameters to generate predictions for new inputs.

AI makes numerous predictions and selects the most suitable response based on probability.

A system that processes **natural language** (human speech and text) and generates meaningful responses is called **Generative AI**.

Generative AI can create various forms of content, including text, images, and audio, based on what it has learned.

 

## How Does Generative AI Work?

The process of generative AI can be broken down into four key stages:

---

### **1. Data Training**

AI is first trained using a **dataset**, which is prepared through a process called **data preprocessing**.

#### Key Terms:
- **Data Preprocessing**: Formatting data so that it can be effectively used for AI training.
- **Dataset**: A collection of data used to train or evaluate machine learning models.
- **AI Model**: A trained program that learns from data to recognize patterns and make predictions.

For example:
- **Text-based AI** is trained using books, articles, and web pages.
- **Image-based AI** is trained using thousands of photographs and illustrations.

Once training is complete, the AI system is referred to as a **model**, which can generate new content based on user inputs.

---

### **2. Pattern Recognition**

AI recognizes patterns and extracts relevant features from input data.

- **For text input**, AI **tokenizes** the text, breaking it down into smaller units like words and punctuation. This helps it understand sentence structure and vocabulary patterns.

```text title="Example of Text Tokenization"
Input: "The weather is nice today."

Tokenized Output: ['The', 'weather', 'is', 'nice', 'today', '.']
```

 

- *Image Input*: The input image's shapes, colors, and key elements are analyzed to extract features, converting them into vectors. [Vectors](https://academy.codefriends.net/en/ai/prompt-engineering/basics/chapter-1/how-ai-understands-prompt#2-Embedding) represent words or sentences in numerical form.

```text title="Example of Image Analysis"
Input: "Apple image"

Feature Extraction: Color (red), Shape (round), Object (apple)

Vectorization: [0.9, 0.1, 0.0, ...]
```

 

## 3. Context Understanding

Once AI has identified patterns, it analyzes context to determine how different elements relate to each other. For text, this means understanding the relationships between words in a sentence. For images, it means recognizing how various visual elements interact.

 

## 4. Content Generation

The trained AI model generates new data. For text generation, it predicts the next word probabilistically and completes the sentence based on this. For image generation, it creates a new image suitable to the given description.

 

More detailed information on how generative AI processes text input can be found in the course [How Generative AI Understands Prompts](https://academy.codefriends.net/en/ai/prompt-engineering/basics/chapter-1/how-ai-understands-prompt).

Data preprocessing refers to the process of transforming data into a format suitable for AI model training.

How Generative AI Works in 4 Steps: AI as a Function

What Does Training AI Mean?

How Does Generative AI Work?

1. Data Training

Key Terms:

2. Pattern Recognition

3. Context Understanding

4. Content Generation

Which of the following statements about data preprocessing is correct?