I’m working on a project where I need to aggregate text information from a series of hyperlinks. Specifically, I’m looking at the Hypercerts Docs webpage: Hello from Hypercerts | Hypercerts. This page contains several sections, each with multiple links. My goal is to scrape text from about 10-16 of these hyperlinks.
After scraping, I want to use the gathered data to query ChatGPT for insights and summaries. I’m still new to Bardeen and would greatly appreciate any guidance or tips on setting up a scraper for this specific use case.
First I select list, then click some of the links, then no pagination, then that leads me to the first place I’m not sure what to do:
Hi @charrison, happy to take a look at this for you!
It appears you are trying to use the magic box to create the automation for you, but since this is pretty specific, I recommend creating it on your own.
I’ve created the below playbook for this use case:
Essentially we are grabbing all of the links and getting their html code and converting that into text so Chat GPT can generate a summary for each one, then inputting it into a Google Sheets.
Here’s the Public GSheet to see what the result will look like: @Charrison - Google Sheets – Feel free to edit the prompt if there is something else you are looking for from ChatGPT.
As this is a lot of text for Open AI to generate summaries for, it will take a little while. It might be better to use this automation to run per link - Get a summary of the current page using OpenAI
I hope this helps!
Please let me know if you have any questions or need more information.
Thank you,
Jess
Wow thank you so much for doing that!!! As I look through what you built a lot of it doesn’t make sense to me and I don’t think I’ll be able to replicate it myself the next time I encounter a similar situation. I hate to ask you to do it again but any chance you could take a screen capture? If I could see the whole process I think I’d understand it a LOT more. As I thought about it I also realized I don’t need it to have this many steps either. The time consuming part for me to do manually is clicking each of the links and copy-pasting all the text into a document. That’s really all I need, I can easily take that content and manually put it into chatGPT, that only takes a second. I also wonder if it would be possible for the data to go into a .txt document instead of a spreadsheet. When I went to upload the spreadsheet into ChatGPT it seemed to have trouble reading all the text in the second column, but when I copied all of that into a plain text document I was able to ask questions of the content from all those pages which was the initial goal.
Even though I don’t need the ChatGPT step in the automation, I’m curious how do I edit the ChatGPT prompt for future reference? I don’t know if Bardeen Ambassador is a paid position or not but I’d be happy to compensate you for your time.
Also, I was just reading this tutorial and I’m wondering if what you built qualifies as a ‘deep scraper’? It seems like it based on this “This is usually done when scraping search results and then going through every page on that list to extract additional data”
I’ll try to get you a video by the end of the week to further explain and create your new request of the automation without the ChatGPT and use Google doc instead. Your use case is atypical of how an automation is normally built so please keep that in mind.
Kind of(conceptually yes), but not technically. Deep scraper involves two or more scraping actions like first scraping a list of LinkedIn job URLs and then from those URLs scrape further details about each job specifically like title, salary, etc. In this automation we only use one - “scrape data on active tab” to grab the links first. And then we use a couple more Bardeen actions to convert the html of each page into text from each link.
Compensation would be greatly appreciated - Bardeen Ambassadors do not work for Bardeen, but are a representative of the brand itself so it’s something extra I’m doing outside of my full time job.
Here’s more about me if you care to learn:
I would like to invite you to join our recurring happy hours (every Tuesday and Thursday) and connect live with the Bardeen Team and fellow builders as we help you get started with Bardeen and activate your first automations.
We open this space to everyone in our community.
What can you expect:
An onboarding to Bardeen
Help you activate your first automations
Explore use-cases to improve sales in your business
Q+A with Bardeen team
Duration: 1 hour
Book your session here: Bardeen Weekly Onboarding Sessions
We also have a series of resources on our Youtube channel that take you step by step into creating your first playbook. One of my favorites is the Ultimate Scrape Tutorial.
Customer Support - bardeen.ai
Explore | @bardeenai | Bardeen Community
Thank you so much Jess! I watched the video you made which was definitely helpful. I tried to run the automation myself but only one .txt doc actually downloaded to my computer even though it says they all did
Also I sent you a bit on Paypal and would be interested to chat about an ongoing consultation if you’re open to it, like some Bardeen tutoring basically? Let me know what you would want to charge hourly if you are open to it and I’ll see if I can budget it!