Hi I want to scrape the source code of Webpages for SEO purposes. The file though are pretty big at least too big for google spreadsheet cell or notion or airtable. Any tips/ ideas how I can deal with this or split it up?
thanks!
Hi, I’m curious why you need to scrape the source code vs the frontend website pages?
Hi,
There is a “Get HTML of a page” action that could be of help to you. Once HTML is fetched, you can send it to yourself in an email or a Slack message or as HTTP POST request, there are other options as well, but I can’t think of a way to split the output unfortunately.
Here is a Getting Started guide which introduces you to Bardeen Basics https://www.bardeen.ai/tutorials/getting-started and here is a list of tutorials https://www.bardeen.ai/tutorials.
Hope this helps!
Victoria
Customer Support - bardeen.ai
Knowledge Base https://support.bardeen.ai/hc/en-us
Explore | @bardeenai | Bardeen Community
Hi Jess, I want to automate the process for finding blogs, articles,meta tags, date published and updated etc. And do a bunch of websites in one go. So then things do not have the same structure over different websites. Therefore my assumption is that I need to find it in the source code. Do you have a better approach?
I would maybe try gathering the sites in a google sheet - just the URL,
Then have Bardeen extract the HTML using the “Get HTML of a page” action like Victoria mentioned above.
Then user the “Find values using regular expression” action
Use regular expression to get all of your meta tags for example using <meta.*?>.
Use Chat GPT outside of the automation to help you come up with the right regular expression formulas.
I hope this helps!