lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

pythonIntroCrawlingChapter4Title

lesson11Title

pythonIntroCrawlingChapter1Title

pythonIntroCrawlingChapter2Title

pythonIntroCrawlingChapter3Title

import requests
from bs4 import BeautifulSoup

# Django GitHub repository URL
url = "https://github.com/django/django"

# Fetch the HTML of the webpage using requests
response = requests.get(url)
html_content = response.text

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

# IDs for Stars and Forks
ids_to_find = ['repo-stars-counter-star', 'repo-network-counter']
found_contents = {}

# Find the content using the specified IDs
for id_value in ids_to_find:
    element_content = soup.find(id=id_value)
    found_contents[id_value] = element_content.get_text() if element_content else "No content"

# Print the found content
for id_value, content in found_contents.items():
    print(f"ID '{id_value}': {content}")
    print('-' * 40)

# Crawling Stars and Forks Count from a Repository

In this lesson, we'll delve into a more structured logic to crawl and display the `Stars (Likes)` and `Forks (Project Clones)` count from a repository.

**Step 1**
 ```python title="Fetch HTML from Web Page"
 response = requests.get(url)
 html_content = response.text
 ```
 - `requests.get(url)`: A function that fetches data from a web page at the given URL. In this context, it targets the GitHub repository page of Django.
 - `response.text`: Extracts the HTML content as a string from the response obtained by `requests.get`.

 

**Step 2**
 ```python title="Parse HTML"
 soup = BeautifulSoup(html_content, 'html.parser')
 ```
 - `BeautifulSoup(html_content, 'html.parser')`: Utilizes `BeautifulSoup` to parse `html_content`, enabling easy access and manipulation of HTML elements.

 

**Step 3**
 ```python title="Locate Stars and Forks Count"
 ids_to_find = ['repo-stars-counter-star', 'repo-network-counter']
 ```
 - This list holds the IDs of HTML elements that display the stars and forks count. These IDs are used to locate the information on the webpage.

 

**Step 4**
 ```python title="Extract Information"
 for id_value in ids_to_find:
 element_content = soup.find(id=id_value)
 found_contents[id_value] = element_content.get_text() if element_content else "No content"
 ```
 - `soup.find(id=id_value)`: Finds the HTML element with the specified ID in the parsed HTML content.
 - `element_content.get_text()`: Extracts the text content from the found element. If the element doesn't exist, "No content" is returned.

 

**Step 5**
 ```python title="Output"
 for id_value, content in found_contents.items():
 print(f"ID '{id_value}': {content}")
 ```
 - `found_contents.items()`: Iterates through the found content, printing each ID and its corresponding text content, allowing users to see the stars and forks count.

 

## Practical Exercise

- Execute the code above with a different repository URL on GitHub.

- Practice extracting various data by using different IDs or classes.

python_execution