BeautifulSoupλ?
BeautifulSoup
μ νμ΄μ¬μμ μΉ ν¬λ‘€λ§μ μ½κ² μνν μ μλλ‘ λμμ£Όλ λΌμ΄λΈλ¬λ¦¬λ‘, HTML νμΌμμ λ°μ΄ν°λ₯Ό μΆμΆνκ³ νμ±ν©λλ€.
νμ±(parsing)μ μΉνμ΄μ§μ HTML λ¬Έμλ₯Ό λΆμνμ¬ μνλ λ°μ΄ν°λ₯Ό μΆμΆνλ κ³Όμ μ λ»ν©λλ€. BeautifulSoupμ μ΄λ¬ν νμ± μμ μ μ½κ² μνν μ μλλ‘ λμ΅λλ€.
BeautifulSoupμ κΈ°λ₯κ³Ό νΉμ§
-
λ€μν νμ μ§μ
-
BeautifulSoupμ HTML/XML λ¬Έμλ₯Ό νμ±νκΈ° μν΄ μ¬λ¬ μ’ λ₯μ νμ(parser)λ₯Ό μ§μν©λλ€.
-
κ°μ₯ μΌλ°μ μΌλ‘ μ¬μ©λλ νμλ
html.parser
(νμ€ νμ΄μ¬ λΌμ΄λΈλ¬λ¦¬)μlxml
μ λλ€.
-
-
κ°νΈν λ°μ΄ν° μΆμΆ
-
νΉμ νκ·Έ, ID, ν΄λμ€ λ±μ μ½κ² κ²μν μ μμ΅λλ€.
-
ν μ€νΈ, μμ± κ° λ± μΉνμ΄μ§μ λ€μν μμλ€μ ν¨κ³Όμ μΌλ‘ μΆμΆν μ μμ΅λλ€.
-
-
볡μ‘ν HTML ꡬ쑰 λ€λ£¨κΈ°
-
μ€μ²©λ νκ·Έλ 볡μ‘ν HTML ꡬ쑰λ μ½κ² νμνκ³ , νμν λ°μ΄ν°λ₯Ό μΆμΆν μ μμ΅λλ€.
-
νκ·Έμ κ³μΈ΅μ κ΄κ³λ₯Ό νμ©νμ¬ μ νν λ°μ΄ν° μμΉλ₯Ό μ°Ύμ μ μμ΅λλ€.
-
-
μ μ°ν κ²μ λ°©λ²
-
CSS μ νμ, μ κ· ννμ λ± λ€μν λ°©λ²μΌλ‘ λ°μ΄ν°λ₯Ό κ²μν μ μμ΅λλ€.
-
볡μμ 쑰건μ κ²°ν©νμ¬ νΉμ ν¨ν΄μ κ°μ§ λ°μ΄ν°λ₯Ό μ°Ύλ κ²λ κ°λ₯ν©λλ€.
-
μ¬μ©λ²
from bs4 import BeautifulSoup # HTML λ¬Έμ μμ html_doc = """ <html> <head> <title>The Codefriends' story</title> </head> <body> <p class="title"><b>The Codefriends' story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> </body> </html> """ # BeautifulSoup κ°μ²΄ μμ± soup = BeautifulSoup(html_doc, 'html.parser') # HTML title νκ·Έ λ΄μ© μΆμΆ title = soup.title.text print('Title:', title) # κ²°κ³Ό: The Codefriends' story print('-' * 10) # 'a' νκ·Έμ href μμ± κ° μΆμΆ for link in soup.find_all('a'): print(link.get('href'))
μ€μ΅
νλ©΄ μ€λ₯Έμͺ½ μ½λ μ€ν
λ²νΌμ λλ₯΄κ³ , ν¬λ‘€λ§ κ²°κ³Όλ₯Ό νμΈνκ±°λ μ½λλ₯Ό μμ ν΄ λ³΄μΈμ!
Lecture
AI Tutor
Publish
Design
Upload
Notes
Favorites
Help
Code Editor
Execution Result