Lecture

Handling Nested HTML Elements

Nested HTML elements mean that an element is contained inside another element.

Nested Elements
<div> <p>First paragraph.</p> <p>Second paragraph.</p> </div>

In the example above, the <div> element contains two nested <p> elements.

Handling such elements is an essential skill in web scraping.


Navigating Nested Elements

  1. Understanding Parent-Child Relationships

    • HTML elements can have parent-child relationships.

    • For instance, the <p> elements inside a <div> are child elements of the <div>.

  2. Finding Elements on a Specific Path

    • Use find() or find_all() to locate elements on a specific path.

    • Example: soup.find('div').find('p') finds the first <p> inside the first <div>.


Example: Extracting Nested Elements

Extracting Nested Elements
html_doc = """ <div> <p class="inner-text"> First paragraph. <span>Text within a span</span> </p> <p class="inner-text">Second paragraph.</p> </div> """ soup = BeautifulSoup(html_doc, 'html.parser') # Extract all p tags inside the first div for p in soup.find('div').find_all('p'): print(p.text)

Using Attributes for Extraction

  • Use tag attributes like class, ID, or other attributes to extract specific elements.

  • Example: soup.find_all('a', class_='external_link') finds all <a> tags with the class 'external_link'.


Using CSS Selectors

  • In BeautifulSoup, you can leverage CSS selectors with the select() method.

  • Example: soup.select('div.content > p.paragraph') finds <p> elements that are direct children of a <div> with class 'content', having class 'paragraph'.


Example: Extracting Data from Complex Structures

Extracting Data from Complex Structures
html_doc = """ <div class="content"> <p class="paragraph">First paragraph in content.</p> <div class="inner-content"> <p>Inner paragraph.</p> </div> </div> """ soup = BeautifulSoup(html_doc, 'html.parser') # Extract all p tags within the 'content' class div content_paragraphs = soup.select('div.content p') for p in content_paragraphs: print(p.text)

Practice

Click the Run Code button on the right side of the screen to see the scraping results or modify the code!

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help

Code Editor

Run
Generate

Execution Result