Scraping Wikipedia Homepage Information with Python
Wikipedia is an online encyclopedia created by people worldwide. 📘
In this lesson, we will learn how to collect specific information from a Wikipedia page using Python code.
Using the BeautifulSoup
and requests
libraries, you can extract the title and description from the Wikipedia homepage as shown below.
Step 1: Import Necessary Libraries
import requests from bs4 import BeautifulSoup
This code performs the following:
-
Uses the
import
keyword to load the requests library for HTTP communication -
Uses the
from
keyword to load the bs4 package for web scraping and imports the BeautifulSoup class from the bs4 package
Step 2: Retrieve and Store HTML from the URL
Use BeautifulSoup to retrieve and store the HTML of a webpage in a variable as follows.
# Wikipedia homepage URL url = "https://www.wikipedia.org" # Fetch HTML from the URL using the requests library response = requests.get(url) # Set the encoding of the fetched HTML to UTF-8 response.encoding = 'utf-8' # Store the fetched HTML in the soup variable soup = BeautifulSoup(response.text, 'html.parser')
This code performs the following:
-
Stores the Wikipedia homepage URL in the
url
variable -
Fetches HTML from the URL using
requests.get(url)
-
Parses the fetched HTML with
BeautifulSoup(response.text, 'html.parser')
and stores the parsed result in the soup variable
Step 3: Extract Title and Description Information
Extract desired information from the soup variable as shown below.
# Extract h1 (heading 1, title) from the webpage h1_title = soup.find('h1').text # Extract p (paragraph) tag from the webpage p_description = soup.find('p').text
This code performs the following:
-
Finds the h1 tag in the soup variable using
soup.find('h1').text
to extract the title and stores it in the h1_title variable -
Finds the p tag in the soup variable using
soup.find('p').text
to extract the description and stores it in the p_description variable
Finally, use the print function to display the extracted title and description from the URL.
Practice
Press the Run Code
button on the right to see the scraping results. The first execution may take some time.
You can also change the url
address (e.g., https://www.codefriends.net
) to fetch information from other web pages.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help
Code Editor
Execution Result