Lecture

Exploring Web Crawling on BBC News

Web Crawling refers to the technique of automatically exploring websites and collecting the necessary data.

It commonly involves using an automated program called Crawler to fetch the content of web pages, i.e., HTML code, and then analyze the code to collect the required data.


Difference Between Web Crawling and Web Scraping

Web Crawling and Web Scraping are often used interchangeably, but they technically mean different things.

Extracting specific content from a particular web page using sample code is an example of Web Scraping.

Unlike web crawling, web scraping does not automatically explore multiple web pages and mainly targets one web page or specific data.

In summary, web crawling refers to the process of automatically exploring multiple web pages to collect data, whereas web scraping refers to the process of extracting content from web pages.

However, this course will mainly use the term web crawling as it covers both exploring multiple web pages and data collection.


BBC News Web Crawling Practice

The code in the practice screen scrapes (technically speaking) the article headlines in real-time from the BBC News website.

To fetch and analyze the HTML code of a web page using Python, the requests and BeautifulSoup libraries are commonly used.

The following courses will detail how these libraries are used and what code needs to be written to extract the desired information.

BBC News Web Crawling Code
# BBC News website URL url = "https://www.bbc.com/news" response = requests.get(url) # Check if the request was successful print("status_code:", response.status_code) # Parse HTML data soup = BeautifulSoup(response.text, "html.parser") # Extract 10 article headlines from the page using h2 tags titles = soup.find_all('h2', limit=10)

Press the green ▶︎ Run button in the code editor and check out the article headlines crawled in real-time from the BBC News website! 🙂

Mission
0 / 1

Run the code and check the results.

Lecture

AI Tutor

Publish

Design

Upload

Notes

Favorites

Help

Code Editor

Run
Generate

Execution Result