Lecture

GroupBy and Aggregation Functions

One of the most powerful features in pandas is the ability to group data and perform calculations on each group.

This is useful when analyzing patterns across categories like sales per region, average scores per class, or revenue by product.

The groupby() method splits your data into groups based on the values in one or more columns.

Once grouped, you can apply aggregation functions such as:

  • sum(): total value per group
  • mean(): average value per group
  • count(): number of rows per group
  • max(): highest value per group
  • min(): lowest value per group

GroupBy example

Imagine you have a dataset of sales transactions from different cities. You might want to:

  • Calculate total sales for each city
  • Find the average transaction amount per store
  • Count how many transactions happened in each region

Pandas makes this easy. For example, to calculate total sales per city, you can write:

Calculate Total Sales per City
import pandas as pd df = pd.DataFrame({ "City": ["New York", "New York", "Los Angeles", "Los Angeles", "Chicago", "Chicago"], "Sales": [100000, 150000, 200000, 250000, 300000, 350000] }) df.groupby("City")["Sales"].sum() # Output: # City # New York 250000 # Los Angeles 450000 # Chicago 650000

Syntax Overview

Here's a simple pattern:

Basic GroupBy Syntax
df.groupby("ColumnName")["TargetColumn"].agg("aggregation_function")

You can also use .agg() to apply multiple functions at once.

For example, to calculate the sum, mean, and count of the sales for each category, you can write:

Apply Multiple Aggregations
df = pd.DataFrame({ "Category": ["A", "A", "B", "B", "C", "C"], "Amount": [100, 200, 300, 400, 500, 600] }) df.groupby("Category")["Amount"].agg(["sum", "mean", "count"]) # Output: # sum mean count # Category # A 300 150 2 # B 700 350 2 # C 1100 550 2

The output results show that sum is the sum of the values, mean is the average of the values, and count is the number of rows.

A value of 0 represents category A, 1 represents category B, and 2 represents category C.

Quiz
0 / 1

Using the groupby method in Pandas, you can apply aggregation functions like sum, mean, and count to grouped data.

True
False

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help