lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

pythonIntroCrawlingChapter4Title

lesson11Title

pythonIntroCrawlingChapter1Title

pythonIntroCrawlingChapter2Title

pythonIntroCrawlingChapter3Title

import requests
from bs4 import BeautifulSoup

def crawl_wikipedia_current_events_first_10_titles():
    url = "https://en.wikipedia.org/wiki/Portal:Current_events"

    response = requests.get(url)
    if response.status_code != 200:
        print("Request failed", response.status_code)
        return None

    soup = BeautifulSoup(response.content, "html.parser")

    # Find the div tag containing the Current events portal section
    current_events_section = soup.find("div", {"id": "mw-content-text"})

    # Find all li tags within the div tag
    list_items = current_events_section.find_all("li") if current_events_section else []

    # Extract the text within the li tags and save to a list
    titles = [item.get_text(strip=True) for item in list_items[:10]]

    return titles


# Fetch the first 10 article titles from the 'Current events' section
current_events_first_10_titles = crawl_wikipedia_current_events_first_10_titles()

for title in current_events_first_10_titles:
    print(title)
    print('-' * 40)

# Crawling Latest Trending Articles from Wikipedia

Utilize the `find_all` method of `BeautifulSoup` to crawl significant events from Wikipedia's Current Events section.

<br />

## Example Code Explanation

```python title="Extracting the First 10 Trending Article Titles"
import requests
from bs4 import BeautifulSoup

def crawl_wikipedia_current_events_first_10_titles():
    url = "https://en.wikipedia.org/wiki/Portal:Current_events"

    response = requests.get(url)
    if response.status_code != 200:
        print("Response failed", response.status_code)
        return None

    soup = BeautifulSoup(response.content, "html.parser")

    # Locate the div tag containing the contents of the Current Events section
    current_events_section = soup.find("div", {"id": "mw-content-text"})

    # Find all li tags within the div tag
    list_items = current_events_section.find_all("li") if current_events_section else []

    # Extract text inside li tags and store them in a list
    titles = [item.get_text(strip=True) for item in list_items[:10]]

    return titles
```

<br />

1. `Requesting a Web Page`: Use `requests.get(url)` to request the content of a specific URL.

2. `Checking Response Status`: Verify whether the request was successful by inspecting `response.status_code`.

3. `Creating a BeautifulSoup Object and Parsing Data`: Use `BeautifulSoup(response.content, "html.parser")` to parse the HTML content.

4. `Extracting Data from a Specific Section`: Locate all `li` tags within a particular section of the webpage (e.g., 'Current Events'), and extract the first 10 entries.

<br />

## Practice Exercises

- Use the above code to extract the latest event titles from Wikipedia's 'Current Events' section.

- Experiment with targeting different webpages and sections to practice data extraction techniques.

python_execution