I am considering purchasing a Bardeen premium to get more credits, but I am having trouble creating a playbook to deep scrape all list items with pagination. I have tried everything, but it seems I need your support here.
I need to scrape items from a list and extract information from the list item’s inner pages. I have set up my playbook to get this data and save it into a spreadsheet. It works fine and saves all the information I need:
However, even though I have set up pagination settings, I am only able to get items from the first page. I have tried playing around with the settings, but I still get the same result.
How can I configure the playbook to click through each page and save all elements from there to my spreadsheet?
There are around 20 pages and 15 items on each page, but right now I am only able to save 15 items from the first page.
Do I need to manually open each page and run a playbook from it? How can I make the script actually click through each page and retrieve items from the list on each page?
The website that needs to be scraped is Exhibitor list | Labelexpo Europe 2023. It has 29 pages with 15 items per page. The script should check each item, extract the company’s website and LinkedIn page, and save them to a spreadsheet.
I can see that the list of listings is populate via JS, meaning that there is no actual delay between clicking next and scraping so we need to add a delay and it should solve the issue
I am creating a scraper on my side to see what it will do.
Edit:
I tested without Delay and scraped 8 pages and than the website didn’t want to load next as Bardeen was clicking too fast on the next button
Same happened with 5 secs delay, but for me the scraped worked fine meaning it was switching pages even tho it didn’t got past page 8 I would guess we need to add a larger delay between scrapes
I noticed that sometimes the playbook gets stuck on the first page. However, I figured out that if I hit “Terminate” (the close button), it magically starts going page by page and scraping everything I need. (no idea why it keeps working after I clicked that).
It seems I still need to understand how Deep Scraping works
Here’s the link to my final playbook (I tried to plan around with custom delays, but no luck):
To sum up, I managed to make it work. Thanks for helping out.
But I still can’t understand how to make it stable and set up another similar deep scraper with more than 9 pages without any dances with tambourines