Saving Extracted Article Data to a CSV File
In this lesson, we will learn how to save the extracted BBC News article titles into a CSV file
.
CSV (Comma Separated Values) represents data separated by commas.
CSV files can be easily opened in spreadsheet programs like Excel and Google Sheets, and are commonly used to store and load data.
Code to Extract Article Titles from the BBC Website
Python’s built-in csv
module allows you to conveniently process and save data in CSV format.
import csv from io import StringIO from bs4 import BeautifulSoup import requests # Send a request to the BBC News homepage url = "https://www.bbc.com/news" response = requests.get(url) # Parse the HTML data soup = BeautifulSoup(response.text, "html.parser") # Extract 10 article titles from h2 HTML tags titles = soup.find_all('h2', limit=10) # Prepare data to save in CSV format data = [] for title in titles: # Extract only the article title data.append([title.text])
The above code extracts article titles from the BBC News website that have h2
HTML tags and outputs them in CSV format.
The h2 (Heading2) tag denotes the title on a web page, and article titles on internet news are generally written using h1 (main title), h2 (subtitle), and h3 (sub-subtitle) tags.
The code soup.find_all('h2', limit=10)
extracts up to 10 article titles written with h2 tags from the HTML data.
Saving as a CSV File
# Create a StringIO object (temporarily stores data in memory like a file) output = StringIO() # Save the StringIO object as a CSV file csv_writer = csv.writer(output) # Add headers to the CSV csv_writer.writerow(['Number', 'Article Title']) # Add the number and article title to the CSV for idx, title in enumerate(titles, 1): csv_writer.writerow([idx, title.text.strip()]) # Output the result in CSV format print(output.getvalue()) # Close the StringIO object output.close()
The StringIO
object can temporarily save data in memory like a file.
Using csv.writer
, you write the CSV file, and by calling csv_writer.writerow()
, you add headers and data.
Finally, by calling output.getvalue()
, you output the result in CSV format.
Now, by executing the above code, the extracted BBC News article titles will be output in CSV format.
To save the data as a CSV file on your computer instead of displaying it, add the following code:
# Save the StringIO object as a CSV file with open('bbc_news.csv', 'w', newline='') as f: f.write(output.getvalue())
The above code generates a bbc_news.csv
file in the location where the Python script is executed on your computer, and saves the data.
Due to security reasons, file saving is restricted in the practice environment.
To download the CSV file, use the Download
button in the practice environment menu next to the Run
button. :)
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help
Code Editor
Execution Result