Guidelines

BeautifulSoup μ£Όμš” λ©”μ„œλ“œμ™€ ν™œμš©λ²•

이번 μˆ˜μ—…μ—μ„œλŠ” BeautifulSoup의 μ£Όμš” λ©”μ„œλ“œμ™€ κ·Έ ν™œμš© 방법을 κ°„λ‹¨ν•œ μ˜ˆμ œμ™€ ν•¨κ»˜ μ•Œμ•„λ³΄κ² μŠ΅λ‹ˆλ‹€.


νŠΉμ • μš”μ†Œλ₯Ό μ°ΎλŠ” find

μ›Ή νŽ˜μ΄μ§€μ—μ„œ νŠΉμ • μš”μ†Œλ₯Ό μ°Ύκ³  μ‹Άλ‹€λ©΄, find() λ©”μ„œλ“œλ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.

이 λ©”μ„œλ“œλŠ” 쑰건에 λ§žλŠ” 첫 번째 μš”μ†Œλ₯Ό λ°˜ν™˜ν•©λ‹ˆλ‹€.

find λ©”μ„œλ“œ ν™œμš© μ˜ˆμ‹œ
from bs4 import BeautifulSoup html_doc = """ <html><body> <h1>μ•ˆλ…•ν•˜μ„Έμš”</h1> <p>문단 1</p> <p>문단 2</p> </body></html> """ # HTML νŒŒμ‹± soup = BeautifulSoup(html_doc, 'html.parser') # h1 νƒœκ·Έ μ°ΎκΈ° h1_tag = soup.find('h1') # 좜λ ₯: μ•ˆλ…•ν•˜μ„Έμš” print(h1_tag.text)

μœ„ μ˜ˆμ œμ—μ„œλŠ” h1 νƒœκ·Έλ₯Ό μ°Ύμ•„ κ·Έ λ‚΄μš©μ„ 좜λ ₯ν•©λ‹ˆλ‹€.

find()λŠ” 항상 첫 번째둜 μΌμΉ˜ν•˜λŠ” μš”μ†Œλ§Œ λ°˜ν™˜ν•˜λ―€λ‘œ, μ—¬λŸ¬ μš”μ†Œκ°€ μžˆλ‹€λ©΄ 첫 번째 μš”μ†Œλ§Œ λ°˜ν™˜λ©λ‹ˆλ‹€.


μ—¬λŸ¬ μš”μ†Œλ₯Ό ν•œ λ²ˆμ— μ°ΎλŠ” find_all

λ§Œμ•½ 쑰건에 λ§žλŠ” λͺ¨λ“  μš”μ†Œλ₯Ό μ°Ύκ³  μ‹Άλ‹€λ©΄ find_all() λ©”μ„œλ“œλ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.

이 λ©”μ„œλ“œλŠ” 리슀트 ν˜•νƒœλ‘œ κ²°κ³Όλ₯Ό λ°˜ν™˜ν•΄, μ—¬λŸ¬ μš”μ†Œλ₯Ό ν•œ λ²ˆμ— μ²˜λ¦¬ν•©λ‹ˆλ‹€.

find_all λ©”μ„œλ“œ ν™œμš© μ˜ˆμ‹œ
from bs4 import BeautifulSoup html_doc = """ <html><body> <p>문단 1</p> <p>문단 2</p> <p>문단 3</p> </body></html> """ # HTML νŒŒμ‹± soup = BeautifulSoup(html_doc, 'html.parser') # λͺ¨λ“  p νƒœκ·Έ μ°ΎκΈ° p_tags = soup.find_all('p') # λͺ¨λ“  p νƒœκ·Έ 좜λ ₯ for p in p_tags: # 좜λ ₯: 문단 1, 문단 2, 문단 3 print(p.text)

이 μ½”λ“œλŠ” html_doc λ³€μˆ˜μ— λ‹΄κΈ΄ λ¬Έμžμ—΄μ˜ λͺ¨λ“  p νƒœκ·Έλ₯Ό μ°Ύμ•„ 좜λ ₯ν•©λ‹ˆλ‹€.

p_tags λ³€μˆ˜μ— p νƒœκ·Έμ˜ 값듀이 리슀트의 ν˜•νƒœλ‘œ ['문단 1', '문단 2', '문단 3']κ³Ό 같이 μ €μž₯λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.

μ΄λ ‡κ²Œ find_all()은 μ›ν•˜λŠ” μš”μ†Œλ“€μ„ ν•œ λ²ˆμ— 찾을 λ•Œ μœ μš©ν•©λ‹ˆλ‹€.


CSS μ„ νƒμžλ‘œ μ°ΎλŠ” select

CSS μ„ νƒμžλ₯Ό ν™œμš©ν•΄ νŠΉμ • μš”μ†Œλ₯Ό 선택할 λ•ŒλŠ” select()λ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.

select λ©”μ„œλ“œ ν™œμš© μ˜ˆμ‹œ
from bs4 import BeautifulSoup html_doc = """ <html><body> <p>문단 1</p> <div class="content"> <p>문단 2</p> <p>문단 3</p> </div> </body></html> """ # HTML νŒŒμ‹± soup = BeautifulSoup(html_doc, 'html.parser') # .content 클래슀 μ•ˆμ— μžˆλŠ” λͺ¨λ“  p νƒœκ·Έ μ°ΎκΈ° content_p_tags = soup.select('.content p') for p in content_p_tags: # 좜λ ₯: 문단 2, 문단 3 print(p.text)

이 μ½”λ“œμ—μ„œλŠ” .content 클래슀 μ•ˆμ— μžˆλŠ” p νƒœκ·Έλ“€μ„ λͺ¨λ‘ 선택해 좜λ ₯ν•©λ‹ˆλ‹€.


첫 번째 μš”μ†Œλ§Œ μ„ νƒν•˜λŠ” select_one

select_one() λ©”μ„œλ“œλŠ” select()와 μœ μ‚¬ν•˜μ§€λ§Œ, 쑰건에 λ§žλŠ” 첫 번째 μš”μ†Œλ§Œ λ°˜ν™˜ν•©λ‹ˆλ‹€.

select_one() λ©”μ„œλ“œ ν™œμš© μ˜ˆμ‹œ
from bs4 import BeautifulSoup html_doc = """ <html><body> <div class="content"> <p>문단 1</p> <p>문단 2</p> </div> </body></html> """ # HTML νŒŒμ‹± soup = BeautifulSoup(html_doc, 'html.parser') # .content 클래슀 μ•ˆμ˜ 첫 번째 p νƒœκ·Έ μ°ΎκΈ° first_p_tag = soup.select_one('.content p') # 좜λ ₯: 문단 1 print(first_p_tag.text)
Mission
0 / 1

BeautifulSoupμ—μ„œ 첫 번째둜 μΌμΉ˜ν•˜λŠ” μš”μ†Œλ₯Ό μ°ΎκΈ° μœ„ν•œ λ©”μ„œλ“œλŠ” λ¬΄μ—‡μΈκ°€μš”?

find_all()

select()

find()

select_one()

Guidelines

AI Tutor

Publish

Design

Upload

Notes

Favorites

Help

Code Editor

Run
Generate

Execution Result