GroupBy and Aggregation Functions
One of the most powerful features in pandas is the ability to group data and perform calculations on each group.
This is useful when analyzing patterns across categories like sales per region, average scores per class, or revenue by product.
The groupby()
method splits your data into groups based on the values in one or more columns.
Once grouped, you can apply aggregation functions such as:
sum()
: total value per groupmean()
: average value per groupcount()
: number of rows per groupmax()
: highest value per groupmin()
: lowest value per group
GroupBy example
Imagine you have a dataset of sales transactions from different cities. You might want to:
- Calculate total sales for each city
- Find the average transaction amount per store
- Count how many transactions happened in each region
Pandas makes this easy. For example, to calculate total sales per city, you can write:
import pandas as pd df = pd.DataFrame({ "City": ["New York", "New York", "Los Angeles", "Los Angeles", "Chicago", "Chicago"], "Sales": [100000, 150000, 200000, 250000, 300000, 350000] }) df.groupby("City")["Sales"].sum() # Output: # City # New York 250000 # Los Angeles 450000 # Chicago 650000
Syntax Overview
Here's a simple pattern:
df.groupby("ColumnName")["TargetColumn"].agg("aggregation_function")
You can also use .agg()
to apply multiple functions at once.
For example, to calculate the sum, mean, and count of the sales for each category, you can write:
df = pd.DataFrame({ "Category": ["A", "A", "B", "B", "C", "C"], "Amount": [100, 200, 300, 400, 500, 600] }) df.groupby("Category")["Amount"].agg(["sum", "mean", "count"]) # Output: # sum mean count # Category # A 300 150 2 # B 700 350 2 # C 1100 550 2
The output results show that sum
is the sum of the values, mean
is the average of the values, and count
is the number of rows.
A value of 0
represents category A, 1
represents category B, and 2
represents category C.
Using the groupby method in Pandas, you can apply aggregation functions like sum, mean, and count to grouped data.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help