lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

pythonIntroCrawlingChapter4Title

lesson11Title

pythonIntroCrawlingChapter1Title

pythonIntroCrawlingChapter2Title

pythonIntroCrawlingChapter3Title

import requests
from bs4 import BeautifulSoup

# Django 깃허브 리포지토리의 URL
url = "https://github.com/django/django"

# requests를 이용해 웹페이지의 HTML을 가져옴
response = requests.get(url)
html_content = response.text

# BeautifulSoup을 이용해 HTML을 파싱
soup = BeautifulSoup(html_content, 'html.parser')

# 스타(Star) 갯수와 포크(Fork) 갯수 
ids_to_find = ['repo-stars-counter-star', 'repo-network-counter']
found_contents = {}

# 찾고자 하는 ID를 이용해 내용을 찾음
for id_value in ids_to_find:
    element_content = soup.find(id=id_value)
    found_contents[id_value] = element_content.get_text() if element_content else "내용 없음"

# 찾은 내용 출력
for id_value, content in found_contents.items():
    print(f"ID '{id_value}': {content}")
    print('-' * 40)

# 리포지토리에서 스타 수와 포크 수 크롤링하기

이번 수업은 보다 더 세밀한 논리 구조를 통해 리포지토리의 `스타(Star, 좋아요)` 수와 `포크(Fork, 프로젝트 복제)` 수를 크롤링하고 출력해 보겠습니다.

**Step 1**
 ```python title="웹 페이지 HTML 가져오기"
 response = requests.get(url)
 html_content = response.text
 ```
 - `requests.get(url)`: 주어진 URL의 웹 페이지로부터 데이터를 가져오는 함수입니다. 이 경우, Django의 GitHub 리포지토리 페이지입니다.
 - `response.text`: `requests.get` 함수로부터 받은 응답에서 HTML 내용을 문자열로 추출합니다.

 

**Step 2**
 ```python title="HTML 파싱"
 soup = BeautifulSoup(html_content, 'html.parser')
 ```
 - `BeautifulSoup(html_content, 'html.parser')`: `BeautifulSoup`을 사용하여 `html_content`를 파싱합니다. 이렇게 하면 HTML 요소에 쉽게 접근하고, 조작할 수 있습니다.

 

**Step 3**
 ```python title="스타와 포크 수 찾기"
 ids_to_find = ['repo-stars-counter-star', 'repo-network-counter']
 ```
 - 이 리스트에는 스타 수와 포크 수를 표시하는 HTML 요소의 ID가 저장되어 있습니다. 이 ID를 사용하여 웹 페이지에서 해당 정보를 찾을 수 있습니다.

 

**Step 4**
 ```python title="정보 추출"
 for id_value in ids_to_find:
 element_content = soup.find(id=id_value)
 found_contents[id_value] = element_content.get_text() if element_content else "내용 없음"
 ```
 - `soup.find(id=id_value)`: 파싱된 HTML 내용에서 주어진 ID를 가진 요소를 찾습니다.
 - `element_content.get_text()`: 찾은 요소에서 텍스트 내용을 추출합니다. 요소가 존재하지 않는 경우, "내용 없음"을 반환합니다.

 

**Step 5**
 ```python title="출력"
 for id_value, content in found_contents.items():
 print(f"ID '{id_value}': {content}")
 ```
 - `found_contents.items()`: 찾은 내용을 순회하면서 각 ID와 해당 텍스트 내용을 출력합니다. 이를 통해 사용자는 스타 수와 포크 수를 확인할 수 있습니다.

 

## 실습 과제

- GitHub의 다른 리포지토리 URL을 사용하여 위 코드를 실행해보세요.

- 다른 ID나 클래스를 사용하여 다양한 데이터를 추출하는 방법을 연습해보세요.

python_execution