Pagination has a problem?

i have tried everything but cant make auto pagination work with sort pages scrapper.

is there some problem about these functionality or existing bug ?

Hi @Urbaninja25, Welcome to the Bardeen Community :slight_smile:

Could you please provide the URL you are trying to scrape from?

Also, would you mind elaborating on this part?

Thank you,
Jess

here u go :

  1. url : https://www.myhome.ge/ka/s/?
  2. i have checked these url beforehand with different tools and got good results for scrapping
    3)i have created scrapper with your “active tab” module with some automation fronting and just giving me results from one page and struggeling to go next page.

yes i have insert btn data with ur pagination option and yes i have configured it afterwords for page counts

Thank you @Urbaninja25, I wasn’t able to get the pagination working on this site either - even with the following selectors:

.page-item.number.normal-item.active + li

.page-item.step-forward-item

Maybe it’s because it’s using javascript here, but I’m not certain.

Perhaps @vin_bardeen and the Bardeen Support Team has a better solution here.


As a workaround - However, it appears that each page generates a “/?Page=1” appendage to the Link so you could build the links to scrape from them individually from a Google Sheet using the “Scrape Data in the background” action.

I hope this helps!
Thank you,
Jess

Hey @Urbaninja25 ,

Just reaching out to say we’re looking into this. Looping in @manvel in case he has any ideas.

Apologies for the delayed response; I’ve been engrossed in intensive coding sessions this week.

In line with Jess’s advice, I’ve devised a basic workaround: generating individual links on my server to match the structure of the target website’s URLs, like this example: https://www.myhome.ge/ka/s/?Page=3. However, this solution won’t suffice for commercial use, as it lacks scalability. To handle commercial demands, I’d need a solution that can manage the generation of 30k+ links efficiently without overtaxing my computational resources daily.

Ideally, if your software could target a specific button, it would alleviate the need for my server to generate such an extensive number of pages—allowing your software to handle the 30k+ pages autonomously would be the optimal solution.

Thank you for your assistance!

@Jess

Hey Jess, thanks for your suggestion!

I’m keen to dive deeper into this workaround. From what I understand, our process begins with Google Sheets, followed by the scraper. There’s a loop involved where, during each iteration, your software accesses the sheets containing individual page data ranging from 1 to 30,000 as rows. It then selects a specific row and scrapes data from the corresponding link, automatically populating the links into the scraper.
also it should do same thing on every iteration with correct page link data .

tbh,I experimented a bit with your software, and it seems like it might not handle these loops or iterations behaviours. I’m also uncertain if it’s equipped to efficiently pull data from sheets into the scraper as input.

I’d really appreciate it if you could elaborate more on the workaround you mentioned.

Hi @Urbaninja25,

Bardeen does have the ability to loop through each provided link and scrape information from each link when executing one automation.

Here’s how the workaround can be accomplished at a very basic level, build an automation with the following actions:

  1. Get table from Google Sheet
  2. Scrape Data in the background

Here’s a video on how this works:

Scrape LinkedIn Job URLs from GSheet

When scraping multiple links, it’s best practices to scrape in batches.

I hope this helps!
Thank you,
Jess

Hi @Urbaninja25 :wave:

Thank you so much for raising the issue, the problem is indeed is related to the fact that website uses inline script for navigation, rather than traditional SEO friendly links:

Bardeen and many other tools does load their scripts into something called “Isolated World” - they have access to the page DOM and can trigger events including clicks, but the problem as in this case is that due to the Content Security policy, scripts in the Isolated worlds, can not make inline script execution which lives in the page.

Unfortunately right now we can not trigger that, the solution actually can be in the manual link generation as @Jess suggested and using Background Scraping command.

As soon we have better solution, I will keep you updated.

Many thanks,
Manvel

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.