lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

pythonIntroCrawlingChapter2Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

pythonIntroCrawlingChapter1Title

pythonIntroCrawlingChapter3Title

pythonIntroCrawlingChapter4Title

from bs4 import BeautifulSoup

# HTML 문서 예시
html_doc = """
<div class="content">
  <p class="info">This is a paragraph.</p>
  <a href="http://example.com/page1">Link to Page 1</a>
  <a href="http://example.com/page2">Link to Page 2</a>
  <img src="image1.jpg" alt="Image 1">
  <img src="image2.jpg" alt="Image 2">
</div>
"""

# BeautifulSoup 객체 생성
soup = BeautifulSoup(html_doc, 'html.parser')

# 클래스 'info'를 가진 <p> 태그의 텍스트 추출
info_text = soup.select_one('.info').text
print("Paragraph Text:", info_text)

print("-" * 20)

# 모든 링크(<a>)의 URL 추출
print("Links URLs:")
for link in soup.select('a'):
    print(link.get('href'))

print("-" * 20)

# 모든 이미지(<img>)의 src 속성 값 추출
print("Image Sources:")
for img in soup.select('img'):
    print(img.get('src'))

# CSS 선택자를 활용한 데이터 추출

CSS 선택자는 여러 HTML 요소 중에서 특정 요소를 선택합니다.

<br />

## 기본 CSS 선택자

1. `클래스 선택자`: `.classname` 형태로, 해당 클래스를 가진 모든 요소를 선택합니다.

2. `ID 선택자`: `#idname` 형태로, 특정 ID를 가진 요소를 선택합니다.

3. `요소 선택자`: `tagname` 형태로, 해당 태그를 가진 모든 요소를 선택합니다.

<br />

## BeautifulSoup에서 CSS 선택자 사용

BeautifulSoup의 `select()` 메소드를 사용하여 CSS 선택자로 요소를 찾을 수 있습니다.

```python title="CSS 선택자로 요소 찾기"
soup = BeautifulSoup(html_doc, 'html.parser')
# 클래스가 'my-class'인 모든 요소 찾기
class_elements = soup.select('.my-class')

# ID가 'my-id'인 요소 찾기
id_element = soup.select('#my-id')

# 모든 <a> 태그 찾기
a_elements = soup.select('a')
```

<br />

## 텍스트 데이터 추출

- CSS 선택자로 찾은 요소에 `.text` 속성을 사용하여 텍스트 내용을 추출합니다.

```python title="텍스트 데이터 추출"
soup = BeautifulSoup(html_doc, 'html.parser')

# 클래스가 'my-class'인 요소의 텍스트 추출
for el in soup.select('.my-class'):
    print(el.text)

# ID가 'my-id'인 요소의 텍스트 추출
print(soup.select_one('#my-id').text)
```

<br />

## 속성 값 추출

- 특정 속성(예: `href`, `src`)을 가진 요소의 해당 속성 값을 추출할 수 있습니다.

```python title="속성 값 추출"
# 모든 <a> 태그의 href 속성 값 추출
for a in soup.select('a'):
    print(a.get('href'))

# 이미지 태그(<img>)의 src 속성 값 추출
for img in soup.select('img'):
    print(img.get('src'))
```

<br />

## 사용 예시

```python title="사용 예시"
html_doc = """
<div class="content">
  <p class="info">This is a paragraph.</p>
  <a href="http://example.com">Example Link</a>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# 클래스 'info'를 가진 <p> 태그의 텍스트 추출
info_text = soup.select_one('.info').text
print(info_text)

# 모든 링크(<a>)의 URL 추출
for link in soup.select('a'):
    print(link.get('href'))
```

<br />

## 실습

화면 오른쪽 _`코드 실행`_ 버튼을 누르고, 크롤링 결과를 확인하거나 코드를 수정해 보세요!

python_execution