Lecture

Exploring Web Crawling on BBC News

Web Crawling is a technique used to automatically explore websites and collect relevant data.

It typically involves an automated program called a crawler, which fetches the content of web pages (HTML code) and analyzes it to extract the required information.


Difference Between Web Crawling and Web Scraping

Web Crawling and Web Scraping are often used interchangeably, but they have distinct meanings.

  • Web Scraping refers to extracting specific content from a particular web page using code.
  • Web Crawling, on the other hand, involves automatically exploring multiple web pages to collect data.

Unlike web crawling, web scraping does not navigate through multiple pages; instead, it focuses on extracting information from a single page or specific data points.

In summary, web crawling is the process of exploring multiple web pages to collect data, whereas web scraping focuses on extracting content from a specific web page.

However, for simplicity, this course will primarily use the term web crawling as it covers both exploration and data extraction.


BBC News Web Crawling Practice

The code in the practice screen scrapes (technically speaking) article headlines in real-time from the BBC News website.

To fetch and analyze a web page’s HTML using Python, the requests and BeautifulSoup libraries are commonly used.

Future lessons will explain how these libraries work and how to write code that extracts the desired information.

BBC News Web Crawling Code
# Import required libraries import requests from bs4 import BeautifulSoup # BBC News website URL url = "https://www.bbc.com/news" response = requests.get(url) # Check if the request was successful print("Status Code:", response.status_code) # Parse HTML data soup = BeautifulSoup(response.text, "html.parser") # Extract 10 article headlines from the page using h2 tags titles = soup.find_all('h2', limit=10)

Press the green ▶︎ Run button in the code editor and check out the article headlines crawled in real-time from the BBC News website! 🙂

Mission
0 / 1

Run the code and check the results.

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help

Code Editor

Run
Generate

Execution Result