lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

automationIntroOtChapter1Title

from bs4 import BeautifulSoup
import requests

# Send a request to the BBC News homepage
url = "https://www.bbc.com/news"
response = requests.get(url)

# Check if the request was successful
print("status_code:", response.status_code)

# Parse the HTML data
soup = BeautifulSoup(response.text, "html.parser")

# Extract 10 article titles with h2 tags from the page
titles = soup.find_all('h2', limit=10)

# Print the index number and article titles
# Using enumerate() function to get the index number along with the titles
for idx, title in enumerate(titles):
    print(f"{idx+1}. {title.text}")

# Exploring Web Crawling on BBC News

`Web Crawling` is a technique used to **automatically explore** websites and collect relevant data.

It typically involves an automated program called a `crawler`, which fetches the content of web pages (HTML code) and analyzes it to extract the required information.

<br />

## Difference Between Web Crawling and Web Scraping

`Web Crawling` and `Web Scraping` are often used interchangeably, but they have distinct meanings.

- `Web Scraping` refers to extracting specific content from a particular web page using code.
- `Web Crawling`, on the other hand, involves **automatically exploring multiple web pages** to collect data.

Unlike web crawling, web scraping does not navigate through multiple pages; instead, it focuses on extracting information from *a single page* or *specific data points*.

> In summary, **web crawling** is the process of **exploring multiple web pages** to collect data, whereas **web scraping** focuses on **extracting content** from a specific web page.

However, for simplicity, this course will primarily use the term `web crawling` as it covers both **exploration** and **data extraction**.

<br />

## BBC News Web Crawling Practice

The code in the practice screen scrapes (technically speaking) **article headlines** in real-time from the BBC News website.

To fetch and analyze a web page’s HTML using Python, the `requests` and `BeautifulSoup` libraries are commonly used.

Future lessons will explain how these libraries work and how to write code that extracts the desired information.

```python title="BBC News Web Crawling Code"
# Import required libraries
import requests
from bs4 import BeautifulSoup

# BBC News website URL
url = "https://www.bbc.com/news"
response = requests.get(url)

# Check if the request was successful
print("Status Code:", response.status_code)

# Parse HTML data
soup = BeautifulSoup(response.text, "html.parser")

# Extract 10 article headlines from the page using h2 tags
titles = soup.find_all('h2', limit=10)
```

Press the green `▶︎ Run` button in the code editor and check out the article headlines crawled in real-time from the BBC News website! 🙂

python_execution