Scraping multiple URLs and gathering key information

Hello!

I am currently working on building an automation with BardeenAI and the goal is: Given website URLs in a google sheet, visit the page and extract pricing plan information (plan name, pricing information, billing frequency) and return the resulting data back in the spreadsheet in a formatted manner. The initial tests I have done are outstanding, but I want to work on finetuning the playbook a bit and have no idea where to start.

Here is the playbook: HERE

I am running into 2 issues:

  1. The first issue is that all pricing pages are structured and built differently so sometimes Bardeen either can’t pull the info or is pulling info from the wrong areas of the page. What is the best way to set up BardeenAI properly to only scrape the key information I want and format the data cleanly, if the pricing pages differ in structure?

  2. The data that is being scraped and returned from some website URLs is not formatted in a way that is easily readable. Is there any way I can finetune the Bardeen playbook to format the scraped data a certain way?

#Example Output#
Basic plan: $X/month
Professional plan: $X/month
Enterprise plan: $X/month

Cheers!

Hi @corbin, Welcome to the Bardeen Community!

  • thank you for sharing your playbook, but I’m unable to view the URLs in the GSheet as the GSheet isn’t public. Could you please provide the Public view of the GSheet?
  • In order to answer this question thoroughly, I’d need to analyze the structure of the URLs to see if there is a way we could get Bardeen to scrape them properly.
  • Typically I would use the Regular Expression action to better format the data, but that would mean it would all have to be scraped in the same format in the first place. I’m not sure that’s happening in this case just yet. After receiving the URLs, I may be able to provide a better solution for you here too.

Thank you,
Jess

Hi @Jess , thanks for the quick reply!

Here is the Google sheet with the URLs: HERE

1 Like

Gotcha - The only way I’m aware of to ensure the data is pulled correctly is to create a scraper template per domain URL.

Because the URLs are different per domain, this means you’ll need to create one for each domain (23 total here it appears).

Using the agent is a quick way, but it isn’t as precise as what you’re needing I believe.

I see, I think that might defeat the purpose of using an automated solution as building the individual playbook for each website might be more work than actually visiting each site manually.

I still find value in some of the results I get from my current playbook. However, Is there any tweaks /improvements you would make to the playbook to improve the results of extracting data per domain URL?

Hey Corbin,

Since each of the URLs in your GSheet has its own distinct website structure, there’s not a set piece of advice I could provide which would give better guaranteed results for all of the websites. Generally, when scraping from websites listed on a GSheet, we recommend sticking to the same format URLs so that Bardeen can return more reliable results. Hope this helps and please let us know if you require any further assistance. Thank you!

Omansh

Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community

Yep, I can understand this perspective.

Would you mind making the Gsheet public with edit rights for me? (I just requested access)

  • I’ll do some troubleshooting/testing on my end with the exact data to see if I can get you better results.

Also, would you mind providing the data points you’re looking to scrape?
“Only extract the unique plan name and pricing of each subscription service the company is selling. The cost is usually found next to the currency symbol on the page.” Do you only want the price, plan name,

Gave you access @Jess . Yes, all I want is to match each plan name with it’s respective pricing information for all of the services offered on the pricing page. No other information such as feature set, etc is needed.

#Example Output#
Basic plan: $X/month
Professional plan: $X/month
Enterprise plan: $X/month

Hi @Jess - just wanted to follow up.

Also, received an email stating today was my last day of the free trial. Any chance I can receive an extension as we are working through this solution?

Hi @corbin,

I don’t work for Bardeen so I’m not able to extend your trial. Tagging @lucy_bardeen to see if she might be able to assist here.

Thank you,
Jess

Thanks Jess for jumping in! We have another thread going.

Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community

1 Like

Hi @corbin

Would you mind reviewing the results in the “JessTest” sheet?

I’m uncertain if these are better results than before as I don’t have a results set to compare against.

Thank you,
Jess

@Jess , you rock! The results found on the “JessTest” sheet are much more consistent compared to the results I have been getting.

The way the data is extracted for the Typeform, Dropbox, and Ahrefs rows is more of the end result I am looking for.

Is there any edits that can be made to get closer to that format? Also, curious as to why it fails on some URLs (elastic, asana, etc.)

Do you mind sharing your workflow?

1 Like

Request #2779 “Re: Are you really leaving?” was closed and merged into this request. Last comment in request #2779:

Great - thank you!

@corbin, Happy to help buddy!

Here you are:

  • Could do some testing with just these URLs to give a possible educated answer, but I can’t give the more technical, probably more accurate answer as I didn’t design this tool/feature and am unable to find much detail on it.
  • I’ll see if we can tweak it more to get these results, but I’m not totally confident this will work for all website because of differencing in structure.

Hi Corbin!

Since the pricing pages you’re pulling key information from are each structured differently, you might have more success with the “Create table from text with OpenAI” action, rather than using a scraper template. This will allow you to specify the information you’re looking for and have more fine-tuned control over the instructions for the model.

You can also use a Macro in Google Sheets to format your data! This knowledge base article walks you through the steps to do that.

Hope this information helps!

Omansh

Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community

Thanks @Jess.

Thanks for the info, @Bardeeni . However, I do not see the link to the knowledge base article.

1 Like

Sorry about that. Here you go! Use this Google Sheets trick to format your spreadsheet in seconds

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.