How to Use a Proxy with BeautifulSoup in 2025

Updated: May 23, 2024 · 6 min read

Python offers powerful libraries such as BeautifulSoup for parsing and Requests for scraping, but you're likely to get blocked because of restrictions such as IP banning and rate limiting. So in this tutorial, you'll learn to implement a BeautifulSoup proxy to avoid getting blocked.

Ready? Let's dive in!

First Steps with BeautifulSoup and Python Requests

For this example scraping with BeautifulSouip and Python Requests, we'll scrape products from ScrapingCourse.com a demo website with e-commmerce features.

ScrapingCourse.com Ecommerce homepage
Click to open the image in full screen

As a prerequisite, install BeautifulSoup and Requests using the following command:

Terminal
pip install beautifulsoup4 requests

Then, import the required modules.

scraper.py
import requests
from bs4 import BeautifulSoup

Now, send a GET request to the target URL, retrieve the web server's response, and save it in a variable.

scraper.py
url = "https://www.scrapingcourse.com/ecommerce/"
response = requests.get(url)
content = response.content

Let's print the response so we can see the full HTML we'll extract data from. Here's the complete code:

scraper.py
import requests
from bs4 import BeautifulSoup

url = "https://www.scrapingcourse.com/ecommerce/"
response = requests.get(url)
content = response.content

print(content)

And this is the output:

Output
<!DOCTYPE html>
<html lang="en-US">
<head>
    <!--- ... --->
  
    <title>Ecommerce Test Site to Learn Web Scraping – ScrapingCourse.com</title>
    
  <!--- ... --->
</head>
<body class="home archive ...">
    <p class="woocommerce-result-count">Showing 1–16 of 188 results</p>
    <ul class="products columns-4">

        <!--- ... --->

    </ul>
</body>
</html>

From the result above, we can observe a seemingly complex HTML that may not be useful as is. Therefore, to extract the products, we'll parse the HTML stored in the content variable using BeautifulSoup. This allows us to navigate through the HTML structure and retrieve the product names.

To continue and extract only the creature names, create a BeautifulSoup object to parse the content variable.

scraper.py
soup = BeautifulSoup(content, "html.parser")

Next, inspect the target URL on a browser using the DevTools to locate the HTML element that contains the product list. You should see it:

scrapingcourse ecommerce homepage inspect first product li
Click to open the image in full screen

From the image above, each product appears in a list <li> tag. Right-click the <li> element and copy by selector: (.product).

After that, add BeautifulSoup's .select() method, which allows us to identify elements by CSS selectors to locate the specific

  • element that represents the product list within the parsed content. That involves entering your CSS selector as an argument in the select() method.

    scraper.py
    # select the product container
    products = soup.select(".product")
    

    To finish, iterate over the products <li> variable and extract the content of each <h2> using a list comprehension. This will retrieve the individual product names listed within the <li> items.

    scraper.py
    product_names = []
    
    # iterate over each <li> element in the product list
    for product in products:
        
        # use a list comprehension to extract the text content of each <h2> element
        product_names.extend([product.find("h2").get_text() for product in products])
    
    scraper.py
    for name in product_names:
        print(name)
    

    Putting everything together, here's the complete code.

    scraper.py
    import requests
    from bs4 import BeautifulSoup
    
    # define the URL of the website we want to scrape
    url = "https://www.scrapingcourse.com/ecommerce/"
    
    # send a GET request to the URL and retrieve the response
    response = requests.get(url)
    
    # extract the content of the response 
    content = response.content
    
    # create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(response, "html.parser")
    
    # select the product container
    products = soup.select(".product")
    
    product_names = []
    
    # iterate over each <li> element in the product list
    for product in products:
    
        # use a list comprehension to extract the text content of each <h2> element
        product_names.extend([product.find("h2").get_text() for product in products])
    
    for name in product_names:
        print(name)
    

    Yielding the following result:

    Output
    Abominable Hoodie
    Adrienne Trek Jacket
    
    # ... other products omitted for brevity
    
    Ariel Roll Sleeve Sweatshirt
    Artemis Running Short
    

    Congrats, you now know how to scrape using BeautifulSoup and Python Requests.

    However, our target was a test website that allows scraping. Our script will get blocked in practical cases involving modern websites with restrictions, so let's move to use proxies with BeautifulSoup.

    Premium residential proxies to avoid getting blocked.
    Access all the data you need with ZenRows' residential proxy network.
    Try for Free

    How to Use a Proxy with BeautifulSoup and Python Requests

    Proxies allow you to make requests from different IP addresses. As an example, send a request to ident.me using the following code.

    scraper.py
    import requests
    
    url = "http://ident.me/"
    
    response = requests.get(url)
    ip_address = response.text
    
    print("Your IP address is:", ip_address)
    

    Your result should be your IP address.

    Output
    Your IP address is: 190.158.1.38
    

    Now, let's make the same request using a proxy. For this example, we'll take any IP from FreeProxyList. And to implement it, we'll specify the proxy details in the script. That way, you'll be making your request through the specified proxy server.

    Import Requests and set your proxy.

    scraper.py
    import requests
    
    proxy = {
        "https": "https://91.25.93.174:3128"
    }
    

    Then, enter the proxy variable as a parameter in the request.get() method and print the response.

    scraper.py
    url = "http://ident.me/"
    
    response = requests.get(url, proxies=proxy)
    ip_address = response.text
    
    print("Your new IP address is:", ip_address)
    

    Putting it all together, you'll have the following complete code.

    scraper.py
    import requests
    
    proxy = {
        "https": "https://91.25.93.174:3128"
    }
    
    url = "http://ident.me/"
    
    response = requests.get(url, proxies=proxy)
    ip_address = response.text
    
    print("Your new IP address is:", ip_address)
    

    Here's our result:

    Output
    Your IP address is: 91.25.93.170
    

    Congrats, you've configured your first proxy with BeautifulSoup and Python Requests. The result above is the proxy server's IP address, meaning that the request was successfully routed through the specified proxy.

    However, websites often implement measures like rate limiting and IP banning. Therefore, you must rotate proxies to avoid getting flagged.

    To rotate proxies with BeautifulSoup and Python Requests, start by defining a proxy list. Once more, we've obtained a list from FreeProxyList for this example.

    scraper.py
    import requests
    
    # list of proxies
    proxies = [
        "http://46.16.201.51:3129",
        "http://207.2.120.19:80",
        "http://50.227.121.35:80",
        # add more proxies as needed
    ]
    

    After that, iterate over each proxy in the proxies list, make a GET request using the current proxy, and print the response.

    scraper.py
    for proxy in proxies:
        try:
            # make a GET request to the specified URL using the current proxy
            response = requests.get(url, proxies={"http": proxy, "https": proxy})
            
            # extract the IP address from the response content
            ip_address = response.text
            
            # print the obtained IP address
            print("Your IP address is:", ip_address)
    
    	except requests.exceptions.RequestException as e:
            print(f"Request failed with proxy {proxy}: {str(e)}")
            continue  # move to the next proxy if the request fails
    

    Putting everything together, here's the complete code.

    scraper.py
    import requests
    
    # list of proxies
    proxies = [
        "http://46.16.201.51:3129",
        "http://207.2.120.19:80",
        "http://50.227.121.35:80",
        # add more proxies as needed
    ]
    
    url = "http://ident.me/"
    
    for proxy in proxies:
        try:
            # make a GET request to the specified URL using the current proxy
            response = requests.get(url, proxies={"http": proxy, "https": proxy})
            
            # extract the IP address from the response content
            ip_address = response.text
            
            # print the obtained IP address
            print("Your IP address is:", ip_address)
    
    	except requests.exceptions.RequestException as e:
            print(f"Request failed with proxy {proxy}: {str(e)}")
            continue  # move to the next proxy if the request fails
    

    Here's our result:

    Output
    Your IP address is: 46.16.201.51
    Your IP address is: 207.2.120.19
    Your IP address is: 50.227.121.35
    

    Awesome, right? Now you know how to configure a BeautifulSoup proxy and also how to rotate proxies to avoid getting blocked.

    That said, there's a lot more on this topic than that. Check out our guide on how to use a proxy with Python Requests to learn more.

    Also, bear in mind free proxies are unreliable and often fail in practical use cases. We only used them in this example to show you the basics. For example, if you replace ident.me with OpenSea, you'll get error messages, as seen below.

    scraper.py
    Request failed with proxy http://46.16.201.51:3129: HTTPSConnectionPool(host='opensea.io', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x0000027FCC4BA710>, 'Connection to 46.16.201.51 timed out. (connect timeout=None)'))
    
    Request failed with proxy http://207.2.120.19:80: HTTPSConnectionPool(host='opensea.io', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 503 Service Temporarily Unavailable')))
    
    Request failed with proxy http://50.227.121.35:80: HTTPSConnectionPool(host='opensea.io', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x0000027FCC4C0850>, 'Connection to 50.227.121.35 timed out. (connect timeout=None)'))
    

    Fortunately, premium proxies yield better results. Let's use them next.

    Premium Proxy to Avoid Getting Blocked

    Free proxies present significant challenges for web scraping and data collection. Their unpredictable performance, security risks, and low reliability make them impractical for professional use. Most websites quickly detect and block these free proxies, making them unsuitable for sustained scraping operations.

    Premium proxies provide a more robust solution for avoiding blocks. By leveraging high-quality residential IPs with automatic rotation capabilities, premium proxies can effectively mask your scraping requests. Advanced features like geo-targeting significantly boost your scraping success rate.

    ZenRows' Residential Proxies stands out as the best solution, offering access to 55M+ residential IPs across 185+ countries. With powerful features like dynamic IP rotation, intelligent proxy selection, and flexible geo-targeting, all backed by 99.9% uptime, it's perfect for reliable web scraping with the Requests library.

    Let's integrate ZenRows' residential proxies with Python Requests.

    First, sign up and head to the Proxy Generator dashboard. Your proxy credentials will be generated automatically.

    generate residential proxies with zenrows
    Click to open the image in full screen

    Copy your proxy credentials and plug them into this Python code:

    scraper.py
    import requests
    
    proxy = 'http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1337'
    
    proxies = { 
       'http': proxy, 
       'https': proxy
    }
    
    url = 'https://httpbin.io/ip'
    response = requests.get(url, proxies=proxies)
    print(response.text)
    

    When you run this code multiple times, you'll see output like this:

    Output
    // request 1
    {
      "origin": "167.71.192.85:44521"
    }
    // request 2
    {
      "origin": "104.248.56.197:51892"
    }
    

    Congratulations! The different IP addresses in the output confirm that your Requests are successfully routed through ZenRows' residential proxy network. Your scraper is now using premium proxies that significantly reduce the risk of blocks during data collection.

    Conclusion

    Proxies act as intermediaries between your web scraper and the target web server. You can access websites from different IP addresses and bypass restrictions by routing your requests through proxies. However, free proxies do not work for real-world cases.

    A great option is using a Python Requests and BeautifulSoup proxy with ZenRows for effective and scalable web scraping. Sign up now and enjoy 1,000 free API credits.

  • Ready to get started?

    Up to 1,000 URLs for free are waiting for you