Getting img URLs from a website

So here is an interesting one.

I would like to get relevant image URLs (gif, JPEG, WEBP, etc) from an number of websites. These websites will be different every time so I can not build a scraper.

This is the automation that i tried:

  1. Get Page as HTML
  2. Prompt the HTML with OpenAI with this prompt :

Review the HTML code of a concert website and identify the highest-resolution image that most likely features musicians, artists, or bands. Prioritize images with filenames, URLs, alt text, or surrounding metadata containing keywords like ‘concert,’ ‘musician,’ ‘artist,’ ‘band,’ ‘live,’ ‘performance,’ or the names of jazz musician artists or bands. Ensure the image has the largest pixel dimensions or file size and is in a preferred format such as JPEG, PNG, or WEBP. Also, give priority to images with artist names, concert dates, or other relevant context in the surrounding text. Exclude decorative images such as logos, icons, or any images not relevant to the concert content. If multiple images have equal resolution, select the first occurrence with the most relevant metadata. Extract and output only the direct URL of the single highest-resolution image, with no additional text or metadata. If no image is found meeting these criteria, output: no image found.

This doesnt work because the HTML is usually too long.

Any thoughts or other approaches that may work?

Hey Bob,

Glad to hear from you!

Right, an AI prompt will not handle the length. I just got an idea that you can try pulling links with RegEx. Here’s a regex you can use to extract image URLs (e.g., .jpg, .jpeg, .png, .gif, .webp) from raw HTML:

https?:\/\/[^"'\s>]+?\.(?:jpg|jpeg|png|gif|webp)

Example usage notes:

  • Matches most direct image URLs in HTML.
  • Case-insensitive (add i flag if needed depending on your tool).
  • Works even if URLs are inside <img src="">, inline styles, or anywhere in the HTML string.

Just tested this with an automation like this:

Let me know how it goes!

Best,
Victoria

Customer Support - bardeen.ai
Knowledge Base https://support.bardeen.ai/hc/en-us
Explore | @bardeenai | Bardeen Community

1 Like

Hi @victoria_bardeen, Thank you so much! This works, Bardeen Magic :magic_wand: again.
–Bob

1 Like