So here is an interesting one.
I would like to get relevant image URLs (gif, JPEG, WEBP, etc) from an number of websites. These websites will be different every time so I can not build a scraper.
This is the automation that i tried:
- Get Page as HTML
- Prompt the HTML with OpenAI with this prompt :
Review the HTML code of a concert website and identify the highest-resolution image that most likely features musicians, artists, or bands. Prioritize images with filenames, URLs, alt text, or surrounding metadata containing keywords like ‘concert,’ ‘musician,’ ‘artist,’ ‘band,’ ‘live,’ ‘performance,’ or the names of jazz musician artists or bands. Ensure the image has the largest pixel dimensions or file size and is in a preferred format such as JPEG, PNG, or WEBP. Also, give priority to images with artist names, concert dates, or other relevant context in the surrounding text. Exclude decorative images such as logos, icons, or any images not relevant to the concert content. If multiple images have equal resolution, select the first occurrence with the most relevant metadata. Extract and output only the direct URL of the single highest-resolution image, with no additional text or metadata. If no image is found meeting these criteria, output: no image found.
This doesnt work because the HTML is usually too long.
Any thoughts or other approaches that may work?