I’m trying to scrape the details of each driving instructor from this website: Find an ADI Instructor
I need the Name, Driving School name, Email, Phone/2nd Phone, Address, website, and ADI Number.
The scraper I have made is scraping the first page and adds the first 10 driving instructors to the google spreadsheet but can’t seem to scrape any of the following pages.
There’s about 233 pages with 10 driving instructors on each page.
I have added a delay in the scraper of 5 seconds and also tried to experiment with making the delay longer but it has not worked. I have also changed to scrape data in the background and this has made no difference.
We highly recommend the following best practices to avoid some of the issues you are facing:
Add a custom delay per page, so the scraping is more human like and also allows the page to load before scraping. This could happen if the page takes a long time for results to load, Bardeen will then think there are no more results. Could you please go into the playbook builder, look for the scraper action and add a custom delay of about 5 seconds? Adding the custom delay tells Bardeen to wait for 5 seconds every time a new set of results are created when it scrolls down.
Here’s an example:
We’ve added a new setting to your scraper models that would allow the scraper to run in a normal browser window, but behind the currently opened web pages. Previously, the scraper would try to get the data from a minimized window and, in some cases, would fail to do so because of limitations on some websites - like getting a list of reviews from Google maps and so on. Now, you can disable this so that the browser window doesn’t get minimized, but it also doesn’t get in your way because it’s behind your current windows. This setting is available for both new and existing scraper models - you can easily modify your existing scraper models by opening the scraper settings and disabling the “Use minimized window for background scraping” switch.
Scrape in smaller chunks than you are currently doing
I was able to now scrape 2 pages out of 233. When you say scrape in smaller chunks can you show me how I can set this so that the website is able to load in time and bardeen isn’t overloaded?
Because each page has 10 instructors I have set the number to a max of 10 items per page. I have also inputted max pages to 233 and a 5 second delay. I’ll try to add more of a delay and see if that works.
Thank you for your patience. The following which you are working with seems to have a complex internal page structure which makes it difficult for our scraper to grab information for more than a handful of drivers. I’ve escalated this inquiry to our engineering team and will be sure to provide you with an update as soon as significant process is made. Thank you!
Cheers,
Omansh
Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community
Thank you so much for waiting while our engineering team took a closer look at this issue. Unfortunately, as of now, there is no way for you to scrape every page and driver by just running the following playbook once. However, you can configure your playbook so that everytime you click it, a new page is scraped from the website. Please see the loom video link below for instructions on how to do the following. You can do it exactly as shown. Hope this helps!