Pagination has a problem?

Urbaninja25 · December 11, 2023, 6:58pm

i have tried everything but cant make auto pagination work with sort pages scrapper.

is there some problem about these functionality or existing bug ?

Jess · December 11, 2023, 8:32pm

Hi @Urbaninja25, Welcome to the Bardeen Community

Could you please provide the URL you are trying to scrape from?

Also, would you mind elaborating on this part?

Thank you,
Jess

Urbaninja25 · December 11, 2023, 8:38pm

here u go :

url : https://www.myhome.ge/ka/s/?
i have checked these url beforehand with different tools and got good results for scrapping
3)i have created scrapper with your “active tab” module with some automation fronting and just giving me results from one page and struggeling to go next page.

yes i have insert btn data with ur pagination option and yes i have configured it afterwords for page counts

Jess · December 11, 2023, 9:19pm

Thank you @Urbaninja25, I wasn’t able to get the pagination working on this site either - even with the following selectors:

.page-item.number.normal-item.active + li

.page-item.step-forward-item

Maybe it’s because it’s using javascript here, but I’m not certain.

Perhaps @vin_bardeen and the Bardeen Support Team has a better solution here.

As a workaround - However, it appears that each page generates a “/?Page=1” appendage to the Link so you could build the links to scrape from them individually from a Google Sheet using the “Scrape Data in the background” action.

I hope this helps!
Thank you,
Jess

vin_bardeen · December 12, 2023, 10:14pm

Hey @Urbaninja25 ,

Just reaching out to say we’re looking into this. Looping in @manvel in case he has any ideas.

Urbaninja25 · December 12, 2023, 10:35pm

Apologies for the delayed response; I’ve been engrossed in intensive coding sessions this week.

In line with Jess’s advice, I’ve devised a basic workaround: generating individual links on my server to match the structure of the target website’s URLs, like this example: https://www.myhome.ge/ka/s/?Page=3. However, this solution won’t suffice for commercial use, as it lacks scalability. To handle commercial demands, I’d need a solution that can manage the generation of 30k+ links efficiently without overtaxing my computational resources daily.

Ideally, if your software could target a specific button, it would alleviate the need for my server to generate such an extensive number of pages—allowing your software to handle the 30k+ pages autonomously would be the optimal solution.

Thank you for your assistance!

Urbaninja25 · December 12, 2023, 10:53pm

@Jess

Hey Jess, thanks for your suggestion!

I’m keen to dive deeper into this workaround. From what I understand, our process begins with Google Sheets, followed by the scraper. There’s a loop involved where, during each iteration, your software accesses the sheets containing individual page data ranging from 1 to 30,000 as rows. It then selects a specific row and scrapes data from the corresponding link, automatically populating the links into the scraper.
also it should do same thing on every iteration with correct page link data .

tbh,I experimented a bit with your software, and it seems like it might not handle these loops or iterations behaviours. I’m also uncertain if it’s equipped to efficiently pull data from sheets into the scraper as input.

I’d really appreciate it if you could elaborate more on the workaround you mentioned.

Jess · December 13, 2023, 6:12pm

Hi @Urbaninja25,

Bardeen does have the ability to loop through each provided link and scrape information from each link when executing one automation.

Here’s how the workaround can be accomplished at a very basic level, build an automation with the following actions:

Get table from Google Sheet
Scrape Data in the background

Here’s a video on how this works:

Scrape LinkedIn Job URLs from GSheet

When scraping multiple links, it’s best practices to scrape in batches.

I hope this helps!
Thank you,
Jess

manvel · December 15, 2023, 4:41pm

Hi @Urbaninja25

Thank you so much for raising the issue, the problem is indeed is related to the fact that website uses inline script for navigation, rather than traditional SEO friendly links:

Bardeen and many other tools does load their scripts into something called “Isolated World” - they have access to the page DOM and can trigger events including clicks, but the problem as in this case is that due to the Content Security policy, scripts in the Isolated worlds, can not make inline script execution which lives in the page.

Unfortunately right now we can not trigger that, the solution actually can be in the manual link generation as @Jess suggested and using Background Scraping command.

As soon we have better solution, I will keep you updated.

Many thanks,
Manvel

system · December 25, 2023, 4:42pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can i extract data if the website has multiple pagination types? ❓Help and questions scraper	4	177	March 2, 2024
Scraper Pagination 🐞 Report an issue	2	243	December 14, 2023
Best practices to scrape this amount of data ❓Help and questions scraper	7	348	September 21, 2023
Help with scraping, issue with Java :c ❓Help and questions	3	202	August 12, 2023
Scraping Only Taking First Item of Each Page 🐞 Report an issue google-sheets , scraper , scraping	6	153	September 4, 2024

Pagination has a problem?

Related topics