Deep Scraping of a List with Items on Different Pages (Help Needed)

vigen_ggg · August 4, 2023, 5:27pm

Hey everyone,

I am considering purchasing a Bardeen premium to get more credits, but I am having trouble creating a playbook to deep scrape all list items with pagination. I have tried everything, but it seems I need your support here.

I need to scrape items from a list and extract information from the list item’s inner pages. I have set up my playbook to get this data and save it into a spreadsheet. It works fine and saves all the information I need:

However, even though I have set up pagination settings, I am only able to get items from the first page. I have tried playing around with the settings, but I still get the same result.

How can I configure the playbook to click through each page and save all elements from there to my spreadsheet?

There are around 20 pages and 15 items on each page, but right now I am only able to save 15 items from the first page.

Do I need to manually open each page and run a playbook from it? How can I make the script actually click through each page and retrieve items from the list on each page?

Thanks so much!

Deyan_Petrov · August 5, 2023, 4:39pm

Hey @vigen_ggg, welcome to the discourse community.

To be able to help you I need you to share the book that you are using. Also, the website if it isn’t behind a paywall.

The book will

vigen_ggg · August 5, 2023, 5:40pm

Hi Deyan,

Please find the link to my playbook: Shared Playbook | Bardeen

The website that needs to be scraped is Exhibitor list | Labelexpo Europe 2023. It has 29 pages with 15 items per page. The script should check each item, extract the company’s website and LinkedIn page, and save them to a spreadsheet.

Thank you!

Deyan_Petrov · August 5, 2023, 6:24pm

Thank you, I am looking at it right now and the active tab scraper is not working at all for me. I also cannot select any fields from it.

I can see that the list of listings is populate via JS, meaning that there is no actual delay between clicking next and scraping so we need to add a delay and it should solve the issue

I am creating a scraper on my side to see what it will do.

Edit:
I tested without Delay and scraped 8 pages and than the website didn’t want to load next as Bardeen was clicking too fast on the next button

Same happened with 5 secs delay, but for me the scraped worked fine meaning it was switching pages even tho it didn’t got past page 8 I would guess we need to add a larger delay between scrapes

Here is the book with my scraper give it a try

vigen_ggg · August 7, 2023, 10:31am

Thanks Deyan,

I noticed that sometimes the playbook gets stuck on the first page. However, I figured out that if I hit “Terminate” (the close button), it magically starts going page by page and scraping everything I need. (no idea why it keeps working after I clicked that).

It seems I still need to understand how Deep Scraping works

Here’s the link to my final playbook (I tried to plan around with custom delays, but no luck):

To sum up, I managed to make it work. Thanks for helping out.

But I still can’t understand how to make it stable and set up another similar deep scraper with more than 9 pages without any dances with tambourines

system · August 14, 2023, 1:08pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with scraping a list of pages in sheet 🐞 Report an issue	4	28	October 17, 2024
Scraping Only Taking First Item of Each Page 🐞 Report an issue google-sheets , scraper , scraping	6	152	September 4, 2024
Playbook not scraping all list items ❓Help and questions	5	43	September 4, 2024
Can't scrape Google Search Results ❓Help and questions scraper	5	250	January 7, 2024
Having Trouble Scraping a website 🏁 Getting started. scraper	5	55	July 5, 2024

Deep Scraping of a List with Items on Different Pages (Help Needed)

Related topics