Scraping nested divs?

I’m struggling with this page here:

https://status.rilegislature.gov/day_introduced.aspx?chamber=H&date=1/5/2024

It looks like this is just terribly constructed, as nested divs instead of as a table, leading to my results always yielding something like the below. Any advice? THANK YOU :slight_smile:

1 Like

Hi @zackmezera,

I’ll see if I can help you out here - Could you please confirm what value would be correct be for “Action” data point?

Thank you,
Jess

1 Like

Thanks for taking a look! Re: “Action”, I’m looking to just grab the below whole lines of text at this point, e.g.,

  • 1/5/2024 Introduced, referred to House Corporations
  • 1/5/2024 Introduced, referred to House Finance

Similarly, “Entitled” being the entire “line” of text beginning ENTITLED. (Which can sometimes be the length of a paragraph.) The “LC” bit in curly brackets is irrelevant.

thanks! :smiley:

2 Likes

Quite the tricky setup on this HTML to scrape from, I’ll say that!

I believe we’ll have to create a separate list scraper template for each data element you’re hoping to scrape from this page.

Here’s the container and selector to use for each data point in a separate list scraper template:

  1. Bill Number
  • use the below for the container selector and then select the Bill Number element and it should work.

#lblBills a

  1. Sponsors
  • use the below for the container selector and then select the Sponsors element on the page and it should work.

#lblBills a + div:not(:first-child)

  1. ENTITLED
  • use the below for the container selector and then select the ENTITLED element on the page and it should work.

#lblBills a + div:not(:first-child) + div

  1. Action
  • use the below for the container selector and then select the Action element on the page and it should work.

#lblBills a + div:not(:first-child) + div + div + div

I realize this is a tedious workaround, but I hope this helps!
Thank you,
Jess

1 Like

Thank you so much for this! I never would have been able to code that out myself.

I’m still having some minor issues—the scrape will not run on schedule, only if I do it manually via right-click—but the core challenge of getting the actual fields in is SOLVED. Thank you thank you :smiley:

2 Likes

Wonderful, love to hear it!

It sounds like you might be using the wrong action to run it on a schedule - Make sure your first action in the automation is the trigger called “when a scheduled event occurs”

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.