Lecture

Crawling U.S. Stock Market Indices with Selenium

In this lesson, we’ll practice real-world web crawling by applying the Selenium knowledge we’ve learned so far.

The practice code demonstrates using Selenium to extract table data from the Americas section on the Yahoo Finance website and organizing the data using the pandas library for output.

Note: Web crawling may fail if the HTML or CSS structure of the website changes. In such cases, the code must be updated accordingly.

Let’s go through the code step by step.


1. Import Required Libraries

Importing Libraries
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import pandas as pd import time
  • selenium: A library for web automation and scraping. It allows interaction with web page elements.

  • pandas: A library for handling data in table formats. It’s useful for data analysis similar to working with Excel.

  • time: A Python built-in module that provides time-related functions.


2. Launch WebDriver and Navigate to Website

Launching WebDriver and Navigating to Website
driver = webdriver.Chrome() driver.get('https://finance.yahoo.com/markets/')
  • webdriver.Chrome(): Launches the Chrome WebDriver to control the browser automatically. It opens a browser window.

  • driver.get(URL): Navigates to the specified URL. Here, it opens the Yahoo Finance ‘Markets’ page.


3. Wait for Page to Load

Waiting for the Page to Load
wait = WebDriverWait(driver, 10)
  • WebDriverWait(driver, 10): Waits up to 10 seconds for an element to appear. This prevents errors caused by running the code before the page has fully loaded.

4. Locate the 'Americas' Section

Locating the Section
americas_section = wait.until(EC.presence_of_element_located((By.XPATH, "//h3[text()='Americas']")))
  • wait.until(): Waits until the h3 tag containing the text ‘Americas’ appears on the page. EC.presence_of_element_located() checks if the element is present on the page.

5. Scroll and Locate the Table within the Section

Finding the Table in the Section
actions = ActionChains(driver) actions.move_to_element(americas_section).perform() parent_section = americas_section.find_element(By.XPATH, "./ancestor::section[contains(@data-testid, 'world-indices')]") table = parent_section.find_element(By.XPATH, ".//table")
  • ActionChains(driver): Automates actions like mouse movements and clicks. Here, it scrolls to the ‘Americas’ section.

  • find_element(By.XPATH): Finds the parent section element containing the ‘Americas’ table.


6. Extract Table Data

Extracting Data from the Table
headers = [header.text for header in table.find_elements(By.XPATH, ".//th")] rows = table.find_elements(By.XPATH, ".//tbody/tr")
  • table.find_elements(): Extracts the table headers and data from rows.
    • th: Refers to table headers.
    • tr: Refers to table rows.

7. Store Data in a List

Storing Table Data as a Dictionary
table_data = [] for row in rows: columns = row.find_elements(By.XPATH, ".//td") row_data = {} for i in range(len(headers)): header = headers[i] column_value = columns[i].text row_data[header] = column_value table_data.append(row_data)
  • Extracts column data (td) for each row (tr) and stores them as key-value pairs in a dictionary, with headers as keys and corresponding values.

8. Convert Data to a Pandas DataFrame and Display

Converting to DataFrame and Displaying
df = pd.DataFrame(table_data) df_filtered = df[['Symbol', 'Price']] print(df_filtered)
  • pd.DataFrame(): Converts the extracted data into a Pandas DataFrame.

  • df[['Symbol', 'Price']]: Filters only the Symbol and Price columns for a clean display of data.


9. Close the Browser

Closing the Browser
driver.quit()
  • driver.quit(): Closes the browser to release resources after completing all tasks.
Mission
0 / 1

Run the code and check the results.

Lecture

AI Tutor

Publish

Design

Upload

Notes

Favorites

Help

Code Editor

Run
Generate

Execution Result