Data Formats Used for Training AI
To train AI models, data must be converted into formats that AI systems can process.
In this lesson, we'll explore the main data file formats used for training AI: CSV
, JSON
, and XML
.
CSV
CSV
stands for Comma-Separated Values and is used to store and transfer table-like data.
Each row represents a data entry, and each column corresponds to a specific attribute.
Values in each column are separated by commas (,
).
For example, a CSV file storing math and English scores of students by name can be represented as follows:
Name,Math,English John Doe,85,90 Jane Smith,88,80
CSV files are stored as text files with the .csv
file extension and can be easily opened and edited in various data management programs like Microsoft Excel, Google Sheets, and database programs.
JSON
JSON (JavaScript Object Notation) is commonly used for data storage and exchange in web and mobile applications.
JSON consists of objects and arrays, with objects wrapped in curly braces { }
and arrays in square brackets [ ]
.
// Array in square brackets [ // Object in curly braces { "Name": "John Doe", "Math": 85, "English": 90 }, { "Name": "Jane Smith", "Math": 88, "English": 80 } ]
A data file format where multiple JSON objects are listed, one per line, is called JSONL (JSON Lines).
{"Name": "John Doe", "Math": 85, "English": 90} {"Name": "Jane Smith", "Math": 88, "English": 80}
When training OpenAI's AI models or general-purpose machine learning models, data files in JSONL format are often used.
XML
XML (eXtensible Markup Language) is primarily used to represent hierarchical data structures.
The key elements of XML include:
-
Tags: Data enclosed in angle brackets
< >
represent the data's structure.- Each element consists of an opening and a closing tag.
- An opening tag is
<tagname>
, and a closing tag is</tagname>
.
-
Attributes: Used to provide additional information within a tag.
- To add attributes to a tag, use
<tagname attributename="attributevalue">
. - Example:
<Student gender="Male">
adds a gender attribute to the Student tag.
- To add attributes to a tag, use
Below is how the JSON example is represented in XML.
<StudentList> <Student> <Name>John Doe</Name> <Math>85</Math> <English>90</English> </Student> <Student> <Name>Jane Smith</Name> <Math>88</Math> <English>80</English> </Student> </StudentList>
When training image-related AI models, image file formats like .jpg
and .png
are used.
Image files are comprised of pixel values, and AI models interpret these pixel values to recognize and classify images.
Data formats for training AI models vary, and the appropriate format should be selected based on the model’s design and purpose.
Which of the following is not a commonly used data file format for training AI models?
CSV
JSON
HTML
XML
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help