lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

pythonIntroCrawlingChapter1Title

pythonIntroCrawlingChapter2Title

pythonIntroCrawlingChapter3Title

pythonIntroCrawlingChapter4Title

# Import the requests library for HTTP requests
import requests
# Import the BeautifulSoup library for HTML parsing
from bs4 import BeautifulSoup

# URL of the Wikipedia page to scrape
url = 'https://en.wikipedia.org/wiki/Web_scraping'

# Retrieve the HTML of the web page
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Create a BeautifulSoup object using the HTML parser
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract the title of the page
    title = soup.find('h1', id='firstHeading').get_text()
    print(f"Title: {title}\n")

    # Separator
    print("-" * 20)

    # Extract the first paragraph
    first_paragraph = soup.find('p').get_text()
    print(f"First paragraph: {first_paragraph}\n")

# If the request was unsuccessful
else:
    print(f"Failed to retrieve the web page. Status code: {response.status_code}")

# Differences Between Web Crawling and Web Scraping

When collecting data from the web, **Web Crawling** and **Web Scraping** are often used interchangeably. However, they are not exactly the same thing.

In this lesson, we will explore the key differences between web crawling and web scraping.

 

## Crawling (Web Crawling)

Crawling is the process of navigating a website's `link structure` and comprehensively **collect and store** data from various pages of a website.

For example, collecting information about multiple products from an online store or starting from a news website’s homepage to collect the latest articles and store them in a database falls under web crawling.

 

### How Does Crawling Work?

Crawling uses data collection software and bots known as `web crawlers` (or spiders). Crawlers start on one page and follow all hyperlinks to gather data.

 

### How is Crawling Used?

Search engines like Google use crawling to **index** web pages. Indexing involves analyzing web page content and systematically organizing it for storage in a database.

This allows search engines to quickly provide users with highly relevant search results.

 

## Scraping (Web Scraping)

Scraping is the process of extracting **specific information** from a `particular web page`.

For instance, extracting a product’s price, description, and images from an online store’s product page is considered scraping.

 

### How Does Scraping Work?

Scraping involves analyzing the HTML content of a web page to extract specific data.

 

### How is Scraping Used?

Unlike crawling, which collects data from an entire website by following links, scraping extracts only the necessary information from a particular web page. For example, scraping can extract the title, author, and publication date from a news article.

 

## To Summarize the Differences:

Crawling refers to the process of following the entire structure and links of a website to collect and store data, while scraping selectively extracts specific information from a particular web page.

Crawling is used by search engines to index web pages using crawlers (or spiders), whereas scraping involves analyzing the HTML content of a specific URL to extract the required information.

 

As mentioned in the <a href="https://www.codefriends.net/courses/python-intro-crawling/chapter-1/crawling-basic?pkg=python-intro-intermediate" target="_blank">previous lesson</a>, collecting data from a specific URL is considered **scraping**.

However, since crawling is a broader and more general term than scraping, we will primarily use “web crawling” in these lessons.

python_execution