How to Automate Web Scraping With ChatGPT (2025)

Sergio Nonide
Sergio Nonide
March 10, 2025 · 5 min read

With OpenAI constantly rolling out new features, ChatGPT is gradually finding applications in key areas across various industries. If you're looking to explore the potential of ChatGPT in web scraping, you're in the right place.

In this tutorial, you'll learn how to automate web scraping using ChatGPT. We'll also discuss some important tips, including how to avoid getting blocked in the process.

Prepare the Target Site's HTML

While you could include the target page's URL in your ChatGPT prompt, we recommend exposing the HTML file to ChatGPT for the best results.

We'll show you how, but before that, let's set the goal of this tutorial. We'll extract product data (names, prices, links, and image URLs) from the ScrapingCourse e-commerce test site, an e-commerce platform designed for practicing web scraping techniques.

Here's what the target page looks like.

ScrapingCourse.com Ecommerce homepage
Click to open the image in full screen

Now, follow the steps below to prepare your target site's HTML.

Navigate to the target page in your browser, right-click anywhere on the page, and select the Inspect option.

This will open the Developer Tools window, as shown in the image below.

scrapingcourse ecommerce homepage inspect first product li
Click to open the image in full screen

Next, Right-click on the <html> tag and select Copy outerHTML to get the entire HTML of the target page.

Then, navigate to your project directory, open a text editor, paste the copied HTML, and save it as a .html file.

That's it! You've prepared the target site's HTML for use in ChatGPT.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Write Your ChatGPT Prompt

Good prompting is essential to maximize ChatGPT's potential. The right prompt can significantly increase ChatGPT's output quality and scraping accuracy.

Therefore, it's crucial that your prompt is clear, concise, and includes enough information for the AI to work with. Also, don't be afraid to experiment and iterate.

Here's a sample prompt for scraping product names, prices, product URLs, and image URLs. It also instructs ChatGPT to save the output as a downloadable CSV file.

You can customize this prompt to suit different scraping needs by modifying the data to be extracted and adding more instructions.

Example
I have provided a website's raw HTML. Analyze it and scrape the product name, price, link, and image URL. Ensure you remove any character encoding to have a clean dataset. Save the extracted data into a downloadable CSV file. 

Follow the steps below to use this prompt.

Navigate to ChatGPT. You'll find a text window that prompts you to initiate a chat by entering your query into the text field. You may need to log in or sign up to use more features.

Next, click the plus (+) icon in the left corner of the text field and select your HTML file to upload. Then, paste your prompt in the text field and press Enter to submit your query.

ChatGPT will process your prompt and return a response similar to the one below.

Click to open the image in full screen

Here's a screenshot of the downloaded CSV file for reference.

Click to open the image in full screen

Awesome! You've effortlessly scraped a web page using ChatGPT.

Tips for Using ChatGPT for Web Scraping

Below are tips on how to further streamline your ChatGPT scraping and get the most out of the AI tool.

1. Get Selectors With ChatGPT

ChatGPT can understand the context and generate relevant responses based on the input you provided. Since you've already uploaded the HTML file, you can ask for the XPath or CSS selectors for your desired data.

This instructs ChatGPT to analyze the HTML structure and identify the appropriate selectors.

If, for instance, you want to extract the product title, you can include the following query in your prompt.

Example
Provide the CSS selectors for the product title in the following HTML

2. Build a Reusable No-code Web Scraper

You can also use ChatGPT to build a no-code web scraping tool to scrape different websites.

For instance, you can use the steps above to obtain the HTML file of the target website and then pass it to the OpenAI API to extract data using specific keywords.

If you're feeling resourceful, you can take things a step further by creating a simple web app that takes the website URL as input, retrieves the HTML, and uses the OpenAI API to specify what data to extract.

3. Generate Complete Web Scraping Code

Another way to take advantage of ChatGPT scraping is as an AI assistant for your web scraping task.

ChatGPT can generate code in any language and for any tool. With good prompting, you can quickly create a complete web scraping code and follow-up to customize the code further according to your needs.

However, if you're using a not-so-popular web scraping tool, you may need to provide some information about the tool or its documentation to get an accurate result.

Here's a well-explained and concise prompt using Python Requests and BeautifulSoup to scrape the same target website as before.

This prompt specifies the programming language, libraries to be used, CSS selectors, and output format.

Example
I have provided a website's raw HTML.

Write a web scraper using Python Requests and BeautifulSoup to extract product names, prices, product links, and image URLs from the HTML file. 

Ensure you remove any character encoding to have a clean dataset. Save the extracted data into a downloadable CSV file. 

CSS selectors:
Product name: .product-name
Price: .product-price
Product Links: li.product > a
Image URLs:.product-image

ChatGPT will reply with a code, following the instructions specified in the prompt.

Click to open the image in full screen

Once you get your reply, copy-paste the ChatGPT code to your preferred editor.

scraper.py
import os
import csv
from bs4 import BeautifulSoup

# load the HTML file
html_file_path = "data/web.html"
with open(html_file_path, "r", encoding="utf-8") as file:
    soup = BeautifulSoup(file, "html.parser")

# extract product data
products = []
for product in soup.select("li.product"):
    name = product.select_one(".product-name")
    price = product.select_one(".product-price")
    link = product.select_one("a")["href"] if product.select_one("a") else ""
    image = product.select_one(".product-image")

    products.append([
        name.text.strip() if name else "",
        price.text.strip() if price else "",
        link.strip(),
        image["src"].strip() if image else ""
    ])

# save to CSV
csv_file_path = "data/products.csv"
with open(csv_file_path, "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["Product Name", "Price", "Product Link", "Image URL"])
    writer.writerows(products)

print(f"CSV file saved: {csv_file_path}")

Remember to set up your project environment before running the code. Install the required libraries using the following command and ensure the HTML file is in your project directory.

Terminal
pip3 install requests beautifulsoup4

Edit the file paths accordingly and run the code.

You'll get the same result as before.

Click to open the image in full screen

Congratulations! You now know another way to leverage ChatGPT for web scraping.

Limitations of ChatGPT for Web Scraping

While ChatGPT can be a valuable automation tool, below are some limitations you must consider.

Limited Pagination Handling

To scrape multiple pages using ChatGPT, you'll need to download each page separately and repeat the process one after the other. This makes it challenging to scale and unsuitable for large-scale scraping tasks.

That said, you can interact directly with the OpenAI API to automate fetching the pages and passing them to GPT to extract data. However, depending on your project needs, this can require a significant amount of coding.

Potential Inaccuracies

ChatGPT is primarily a language model. This means it's prone to mistakes and inaccurate web scraping results. Even the most adequate prompting may not matter, as ChatGPT is less likely to return the correct information if the tool is rare or recent.

Additionally, websites with complex layouts and frequent DOM changes can be challenging for ChatGPT to analyze.

This is because complex websites could include dynamically generated content or nested HTML files, which can make it difficult for ChatGPT to identify and extract data.

When the DOM changes frequently, selectors also change, which can result in the wrong output.

Another limitation to consider is anti-bot solutions. Modern websites employ measures, including CAPTCHAs, to mitigate automated access, and ChatGPT may generate undesired data when trying to scrape them.

Avoid Getting Blocked While Scraping With ChatGPT

Getting blocked is a common web scraping challenge, especially during large-scale scraping projects.

In addition to the tedious need to download and process pages separately, ChatGPT accesses websites using tools like the Python Requests library. Anti-bot solutions often block these tools, preventing ChatGPT from accessing the target HTML.

To scrape websites without getting blocked, consider ZenRows' Universal Scraper API, the easiest way to scrape at any scale.

ZenRows is a web scraping API that provides everything you need to bypass any anti-bot solution. It does this behind the scenes with features such as JavaScript rendering support, premium proxy auto-rotation with flexible geo-targeting, AI-powered CAPTCHA and anti-bot auto-bypass, and everything else you need for reliable web scraping.

What's more, you can combine ZenRows' anti-bot and CAPTCHA bypass features with ChatGPT’s page analysis to automate scraping without limitations.

Even without ChatGPT, ZenRows' auto-parsing feature can make life significantly easier by automatically extracting key data fields and returning the result in a usable format (JSON).

Let's see ZenRows in action against a protected website (the Antibot Challenge page)

To follow along with this example, sign up for your free API key. You'll be redirected to the Request Builder page.

building a scraper with zenrows
Click to open the image in full screen

Input your target URL and activate Premium Proxies and JS Rendering boost mode.

Next, select your preferred language (Python, in this case) and choose the API option. ZenRows works with any language and provides ready-to-use snippets for the most popular ones. We'll use Python for this example.

Copy-paste the generated code to your editor. Your code should look like this:

scraper.py
# pip3 install requests
import requests

url = 'https://www.scrapingcourse.com/antibot-challenge'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
    'url': url,
    'apikey': apikey,
    'js_render': 'true',
    'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)

This code bypasses the anti-bot challenge and retrieves the HTML, as shown below.

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! You've bypassed the anti-bot challenge with ZenRows!

Conclusion

ChatGPT makes web scraping as easy as entering the right prompt in the chat window. However, it isn't scalable, and some limitations can affect the accuracy of your results. The most important is the inability to circumvent anti-bot solutions.

Not to worry, though, ZenRows enables you to efficiently scrape any website at scale without getting blocked.

Sign up now to try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you