Lecture

Fetching Wikipedia Homepage Information with Python

Wikipedia is an online encyclopedia collaboratively built by people around the world. πŸ“˜

In this lesson, we'll use Python code to collect specific information from a Wikipedia page.

Using the BeautifulSoup and requests libraries, we can fetch the title and description of the Wikipedia homepage, as shown below.


Step 1: Import Required Libraries

Importing requests and BeautifulSoup Libraries
import requests from bs4 import BeautifulSoup

The above code performs the following tasks:

  • Uses the import keyword to load the requests library for HTTP communication

  • Uses the from keyword to load the bs4 package for collecting webpage data and imports the BeautifulSoup class from it


Step 2: Fetch HTML from URL and Store It in a Variable

Use BeautifulSoup to fetch and store the HTML of a webpage in a variable, as shown below:

Fetching HTML from Wikipedia Homepage
# Wikipedia homepage URL url = "https://www.wikipedia.org" # Fetch HTML from the URL using the requests library response = requests.get(url) # Set the encoding of the fetched HTML to UTF-8 response.encoding = 'utf-8' # Store the fetched HTML in the soup variable soup = BeautifulSoup(response.text, 'html.parser')

The above code performs the following tasks:

  • Stores the Wikipedia homepage URL in the url variable

  • Fetches HTML from the URL using requests.get(url)

  • Parses the fetched HTML using BeautifulSoup(response.text, 'html.parser') and stores the parsed result in the soup variable


Step 3: Extract Title and Description Information

Extract the desired information from the soup variable as shown below:

Extracting Title and Description from Wikipedia Homepage
# Extract h1 (heading 1, title) from the webpage h1_title = soup.find('h1').text # Extract p (paragraph) tag from the webpage p_description = soup.find('p').text

The above code performs the following tasks:

  • Uses soup.find('h1').text to find the h1 tag in the soup variable, extracts the title, and stores it in the h1_title variable

  • Uses soup.find('p').text to find the p tag in the soup variable, extracts the description, and stores it in the p_description variable

Finally, use the print function to display the title and description fetched from the URL.


Practice

Click the Run Code button on the right-hand side to see the scraping results. The first execution of the code may take some time.

You can also modify the url address in the code (e.g., https://www.codefriends.net) to fetch information from other webpages.

Mission
0 / 1

Which library is used for parsing HTML when web scraping with Python?

requests

BeautifulSoup

urllib

selenium

Lecture

AI Tutor

Publish

Design

Upload

Notes

Favorites

Help