Data Formats Used for Training AI
In order to train AI models, data must be transformed into a format that AI can understand.
In this lesson, we'll explore the key data formats used to train AI, including CSV
, JSON
, and XML
.
CSV
CSV, which stands for Comma-Separated Values, is used to store and transmit data in a table format.
Each row represents an individual data entry, while each column represents a specific attribute of that data. The values in each column are separated by commas.
For example, a CSV file that stores students' math and English grades by name could be represented as follows:
Name,Math,English John Doe,85,90 Jane Smith,88,80
CSV files are saved with the .csv
file extension and can be easily opened and edited with various data management programs like Microsoft Excel, Google Sheets, or database software.
JSON
JSON (JavaScript Object Notation) is primarily used for storing and exchanging data in web and mobile applications.
JSON is composed of objects and arrays; objects are enclosed in curly braces { }
, and arrays are enclosed in square brackets [ ]
.
For more details, see the next lesson.
// An array enclosed in square brackets [ // An object enclosed in curly braces { "Name": "John Doe", "Math": 85, "English": 90 }, { "Name": "Jane Smith", "Math": 88, "English": 80 } ]
XML
XML (eXtensible Markup Language) is mainly used to represent the hierarchical structure of data.
The key elements of XML are as follows:
-
Tags: Data enclosed within
< >
, expressing the hierarchical structure.- Tags are divided into start tags and end tags.
- A start tag is denoted by
<tagname>
, and an end tag by</tagname>
.
-
Attributes: Used to provide additional information within a tag.
- To add an attribute to a tag, use the format
<tagname attributename="attributevalue">
. - Example:
<Student gender="male">
is an example of adding a gender attribute to a Student tag.
- To add an attribute to a tag, use the format
Below is the JSON example represented in XML.
<StudentList> <Student> <Name>John Doe</Name> <Math>85</Math> <English>90</English> </Student> <Student> <Name>Jane Smith</Name> <Math>88</Math> <English>80</English> </Student> </StudentList>
In addition, when training image-related AI models, images are used as training data, and text files (.txt) are often used for training natural language processing models.
다음 빈칸에 들어갈 말로 가장 적합한 단어는 무엇일까요?
Lecture
AI Tutor
Publish
Design
Upload
Notes
Favorites
Help