Scrape only gets the first 25 results from a list

Hi,

I only get the first 25 results from the list. I tried both paging and infinite scroll.

Here is the playbook: https://www.bardeen.ai/playbook/community/Scraping-www.arbeitsagentur.de-TeBjaHvQhSrqb0Kf1T

Thanks in advance!

Hi Jorge,

We highly recommend the following best practices to avoid some of the issues you are facing:

  1. Add a custom delay per page, so the scraping is more human like and also allows the page to load before scraping. This could happen if the page takes a long time for results to load, Bardeen will then think there are no more results. Could you please go into the playbook builder, look for the scraper action and add a custom delay of about 5 seconds? Adding the custom delay tells Bardeen to wait for 5 seconds every time a new set of results are created when it scrolls down.

Hereā€™s an example:

  1. Weā€™ve added a new setting to your scraper models that would allow the scraper to run in a normal browser window, but behind the currently opened web pages. Previously, the scraper would try to get the data from a minimized window and, in some cases, would fail to do so because of limitations on some websites - like getting a list of reviews from Google maps and so on. Now, you can disable this so that the browser window doesnā€™t get minimized, but it also doesnā€™t get in your way because itā€™s behind your current windows. This setting is available for both new and existing scraper models - you can easily modify your existing scraper models by opening the scraper settings and disabling the ā€œUse minimized window for background scrapingā€ switch.

  1. Scrape in smaller chunks than you are currently doing
  2. Using an app to help keep your computer awake. This is a great one for Mac https://apps.apple.com/us/app/jolt-of-caffeine/id1437130425?mt=12

Hope that helps!
Lucy

Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community

Hello Lucy / Bardeeny,

Thank you very much for your suggestions. After applying them (even 10 seconds delay), it still gives only 25 results back.
I see Bardeen clicking on the pagination click, maybe four times, but only saves the first 25 results.
May it be the problem, that the second chunk of 25 results are presented in the same page after the first 25?
You can see the webpage yourself, the pagination button name is Weitere Ergebnisse.
This automation will help me apply to jobs!

Many thanks in advance!

Hi Jorge,

The following issue that youā€™re running into is likely because of the complex HTML structure of the pagination on the site. However, we can overcome this by taking advantage of Bardeenā€™s advanced custom selector capabilities. Instead of selecting the list manually, I used the TagName for each HTML element. You can do the same by inspecting the structure of the website and getting the tagname of the individual element as shown below.

In the case of our website, the job listings each had a tagname of ā€œliā€. This lead to the proper scraping of the website an all of the job postings that were available (2000+). Iā€™ve attached a link to the scraper below so you can experiment with it on your own (make sure to duplicate) and Iā€™ve also added the scraping results which I recieved as a csv file. Hope this helps!

Scraper Playbook: https://www.bardeen.ai/playbook/community/Scraping-www.arbeitsagentur.de-sPDiPl3cOfIjpvpKc4

Omansh

Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community

Hi Omansh/Bardeeni,

Iā€™m so excited to try it!!! But I can not open/edit the playbook. When I try to open the duplicate in the builder, it tells me the I donā€™t have permission to modify the scraper model. See attachment please.

Many thanks in advance!

Hi Jorger,

It looks like you are on an older version of Bardeen. The problem you encountered previously is now fixed in our latest version of Bardeen - 2.48.

Please try updating your version of Bardeen and let me know if youā€™re still facing the same issue.

Hereā€™s how you can update your version of Bardeen:

  1. Open the Google Chrome browser.
  2. Navigate to your extensions page by either:
    ā—¦ Opening a new tab and typing chrome://extensions into the address bar.
    ā—¦ Clicking on the three-dot icon at the top right corner of the browser, selecting ā€œMore tools,ā€ and then ā€œExtensionsā€.
  3. Enable ā€œDeveloper modeā€ by toggling the switch in the top-right corner of the extensions page.
  4. Once Developer mode is enabled, additional options will appear on the page. Look for the Bardeen extension, Click Details and click the ā€œUpdateā€ button.
    This will force Chrome to download and install any available updates for Bardeen.

Cheers,
Lucy

Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community

Hello Lucy,

Nope, I already had the version 2.48.0.

Regards,
Jorge

Hello,

Any solution to the aforementioned problem? I still cannot open the Playbook from Omansh.

Regards,
Jorge