Exploring Web Crawling on BBC News
Web Crawling
is a technique used to automatically explore websites and collect relevant data.
It typically involves an automated program called a crawler
, which fetches the content of web pages (HTML code) and analyzes it to extract the required information.
Difference Between Web Crawling and Web Scraping
Web Crawling
and Web Scraping
are often used interchangeably, but they have distinct meanings.
Web Scraping
refers to extracting specific content from a particular web page using code.Web Crawling
, on the other hand, involves automatically exploring multiple web pages to collect data.
Unlike web crawling, web scraping does not navigate through multiple pages; instead, it focuses on extracting information from a single page or specific data points.
In summary, web crawling is the process of exploring multiple web pages to collect data, whereas web scraping focuses on extracting content from a specific web page.
However, for simplicity, this course will primarily use the term web crawling
as it covers both exploration and data extraction.
BBC News Web Crawling Practice
The code in the practice screen scrapes (technically speaking) article headlines in real-time from the BBC News website.
To fetch and analyze a web page’s HTML using Python, the requests
and BeautifulSoup
libraries are commonly used.
Future lessons will explain how these libraries work and how to write code that extracts the desired information.
# Import required libraries import requests from bs4 import BeautifulSoup # BBC News website URL url = "https://www.bbc.com/news" response = requests.get(url) # Check if the request was successful print("Status Code:", response.status_code) # Parse HTML data soup = BeautifulSoup(response.text, "html.parser") # Extract 10 article headlines from the page using h2 tags titles = soup.find_all('h2', limit=10)
Press the green ▶︎ Run
button in the code editor and check out the article headlines crawled in real-time from the BBC News website! 🙂
Run the code and check the results.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help
Code Editor
Execution Result