lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

pythonDataAnalysisBasicChapter4Title

pythonDataAnalysisBasicChapter1Title

lesson15Title

lesson16Title

lesson17Title

lesson18Title

lesson19Title

lesson20Title

lesson21Title

lesson22Title

pythonDataAnalysisBasicChapter2Title

pythonDataAnalysisBasicChapter3Title

# Handling Missing and Duplicate Data

Real-world datasets are rarely clean or complete.

You’ll often encounter missing values or duplicate records that can distort your analysis.

*Pandas* offers efficient tools to detect, clean, and manage these issues effectively.

<br/>

## Dealing with Missing Data

In Pandas, missing values are typically represented as `NaN` (Not a Number).

You can handle them in several ways:

* *Detect missing values* using `.isnull()` or `.notnull()`
* *Drop missing data* with `.dropna()`
* *Fill missing data* using `.fillna()` (e.g., fill with a default value or forward-fill based on previous values)

Properly handling missing values is crucial before performing calculations like mean, sum, or correlation; otherwise, your results may be skewed.

<br/>

## Handling Duplicate Entries

Duplicate rows can occur due to data entry errors or when merging datasets.

* Use `.duplicated()` to flag duplicates
* Use `.drop_duplicates()` to remove them

Always check if duplicates make sense in the context of your data. Not all repetition is bad.

<br/>

## Summary

| Task                | Method                 | Description                           |
| ------------------- | ---------------------- | ------------------------------------- |
| Detect missing      | `df.isnull()`          | Shows True for missing values         |
| Drop missing rows   | `df.dropna()`          | Removes rows with any NaN             |
| Fill missing values | `df.fillna(value)`     | Replaces NaN with the specified value |
| Detect duplicates   | `df.duplicated()`      | Returns a Boolean Series              |
| Drop duplicates     | `df.drop_duplicates()` | Removes duplicate rows                |

The `.fillna()` method in Pandas is used to replace missing values, represented as NaN, with a specified value. This can be particularly useful when you want to ensure that all your data is complete before performing operations like calculations or visualizations, which can be affected by missing data.

### How can you fill missing data in a DataFrame using Pandas?

Task	Method	Description
Detect missing	`df.isnull()`	Shows True for missing values
Drop missing rows	`df.dropna()`	Removes rows with any NaN
Fill missing values	`df.fillna(value)`	Replaces NaN with the specified value
Detect duplicates	`df.duplicated()`	Returns a Boolean Series
Drop duplicates	`df.drop_duplicates()`	Removes duplicate rows