Python Library for Data Processing, Pandas
When dealing with data structured around axes such as sales by item, or customer inflow by time, this data is typically represented in a tabular format consisting of rows
and columns
.
Pandas
is one of the most widely used libraries in Python for handling tabular data.
By utilizing Pandas, you can systematically perform various tasks from basic loading and saving of data, to filtering and sorting, and even statistical analysis.
The Two Key Data Structures in Pandas
The core data structures in Pandas are Series
and DataFrame
.
1. Series
A Series is a one-dimensional data structure
similar to a column in an Excel spreadsheet.
Data is sequentially ordered, similar to a Python list (array).
Each data point has a unique index (identifier of the data's position), and you can access the data using this index.
import pandas as pd # Creating a series data_series = pd.Series([10, 20, 30, 40]) print(data_series) # Output # 0 10 # 1 20 # 2 30 # 3 40 # dtype: int64
2. DataFrame
A DataFrame is a two-dimensional data structure
consisting of multiple Series.
It has both rows and columns, and each column can have different data types.
Its structure is similar to that of an Excel sheet (spreadsheet).
import pandas as pd # Creating a DataFrame of sales by item data_frame = pd.DataFrame({ 'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes'], 'Sales': [1000, 2000, 1500, 3000] }) print(data_frame) # Output # Item Sales # 0 Apple 1000 # 1 Banana 2000 # 2 Strawberry 1500 # 3 Grapes 3000
In the code example above, a DataFrame is created with the columns Item
and Sales
.
For instance, the code 'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes']
creates a Series similar to a column in an Excel spreadsheet, and these series are combined to form a DataFrame.
What is the Pandas data structure that is created by multiple Series in a 2D array?
List
Tuple
DataFrame
Dictionary
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help
Code Editor
Execution Result