Lecture

Augmenting Data with AI

According to the OpenAI official documentation, a JSONL dataset should contain at least 10 JSON objects, and conducting fine-tuning with only 50-100 high-quality data points can yield good results.

However, manually creating a dataset is time-consuming and costly, which is why using data augmentation techniques to expand the dataset and then refining the augmented data is more efficient.

Data augmentation is a technique for generating new data based on existing data, allowing for an increased dataset size, preventing overfitting, and enhancing the generalization performance of the AI model.

In the past, implementing data augmentation required writing complex code to utilize a program. These days, data augmentation can be easily accomplished by instructing text-generating AI to create new data based on existing data.

Codefriends offers a feature that allows you to perform complex data augmentation with just one click.


Augmenting Data with One Click

You can augment 10 lines of JSON data in the Codefriends fine-tuning practice environment through the 3 steps below.


1. Select Data

thumbnail-600


2. Create a New File

thumbnail-600


3. Automatically Add 10 Lines

thumbnail-600


When augmenting data, generative AI is used to create new training data based on the JSON data that has been generated so far.

Mission
0 / 1

What is the most appropriate word to fill in the blank?

Data augmentation is a technique that generates new data based on existing data, increasing the dataset size to prevent AI model , thereby improving the model's generalization performance.
overfitting
underfitting
bias
variance

Lecture

AI Tutor

Publish

Design

Upload

Notes

Favorites

Help