Lecture

Calculate Summary Statistics with Pandas

How can we calculate the mean, standard deviation, etc., of large datasets all at once?

Defining and calculating functions for each item individually can be a very cumbersome task.

However, using the describe() method of DataFrames allows you to calculate summary statistics at once, including the number of entries, mean, standard deviation, minimum, and maximum values.

Calculate Summary Statistics
import pandas as pd data_frame = pd.DataFrame({ 'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes'], 'Sales': [1000, 2000, 1500, 3000] }) # Calculate summary statistics summary_stats = data_frame.describe() print(summary_stats)

The code data_frame.describe() returns a DataFrame with summary statistics (mean, standard deviation, minimum, maximum, etc.) of the DataFrame.

describe() Method Output
Sales count 4.000000 mean 1875.000000 std 866.025404 min 1000.000000 25% 1375.000000 50% 1750.000000 75% 2250.000000 max 3000.000000

The meanings of each term are as follows:

  • count: Number of entries

  • mean: Mean value

  • std: Standard deviation

  • min: Minimum value

  • 25%, 50%, 75%: Percentiles

  • max: Maximum value


Handling Missing Values

Missing values in a dataset refer to instances where data is absent.

Pandas provides various methods to handle missing values.

Handling Missing Values Example
import pandas as pd data_frame = pd.DataFrame({ 'Item': ['Apple', 'Banana', 'Strawberry', None], 'Sales': [1000, 2000, 1500, None] }) # Check for missing values missing_values = data_frame.isnull() # Replace missing values with 0 data_frame_filled = data_frame.fillna(0) print(data_frame_filled)
Missing Values Replacement Result
Item Sales 0 Apple 1000.0 1 Banana 2000.0 2 Strawberry 1500.0 3 0 0.0

Code Explanation

  • data_frame.isnull() returns a DataFrame indicating the positions of missing values with True.

  • data_frame.fillna(0) returns a DataFrame where missing values are replaced with 0.

  • Instead of data_frame.fillna(0), you can use data_frame.dropna() to remove rows containing missing values.

Mission
0 / 1

What is the most appropriate word to fill in the blank?

To compute summary statistics of a DataFrame, use the method.
describe
summary
mean
aggregate

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help

Code Editor

Run
Generate

Execution Result