How to Use a Proxy With Selenium in Python

Updated: June 27, 2024 · 12 min read

Table of contents

What is a Selenium proxy?
How to set up a proxy in Selenium
- Selenium Proxy authentication
Best proxy protocols
Use a rotating proxy in Selenium
Add premium proxies
Error 403: Forbidden for proxy in Selenium Grid
Conclusion

Getting flagged as a bot while web scraping with Selenium? Yeah, that happens a lot.

Selenium is great for scraping dynamic websites, but it struggles with sophisticated anti-bot systems on its own. If you want to avoid IP blocks, get around geo-restrictions, and handle rate limits, you need to add proxies to your Selenium scraper.

This guide will show:

How to set up a proxy in Selenium?
How to rotate proxies in Selenium?
How to use premium proxies?

Let's dive in!

What Is a Selenium Proxy?

A proxy acts as an intermediary between a client and a server. Through it, the client makes requests to other servers anonymously and securely and avoids geographical restrictions.

Headless browsers can be configured to use proxy servers like HTTP clients. A proxy helps protect your IP address and avoid blocks when scraping protected websites, like Amazon, with Selenium.

Using Selenium with proxy is particularly useful for browser automation activities such as testing and web scraping. Keep reading to learn how to set up a proxy in Selenium for web scraping!

How to Set Up a Proxy in Selenium

In this section, you'll learn how to set up a Selenium proxy using Python. We'll use Chrome, as it's the most popular browser for automation.

If you prefer using another programming language, check out the following tutorials:

Let's start by setting up a basic Python script to control Chrome with Selenium.

The snippet below initializes a headless Chrome driver and visits httpbin, a webpage that returns the IP address of the client making the request. Finally, the script prints the response HTML.

                    scraper.py
                
# pip install selenium webdriver-manager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize Chrome driver
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()), 
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the HTML of the target webpage
print(driver.page_source)

# release the resources and close the browser
driver.quit()

  
  

  
Copied!

The code will print the following HTML:

                    Output
                
<html><head><meta name="color-scheme" content="light dark"><meta charset="utf-8"></head><body><pre>{
  "origin": "50.217.226.40:80"
}
</pre><div class="json-formatter-container"></div></body></html>

Copied!

Note

Note: If you haven't upgraded to Selenium 4 yet, do it, since WebDriver comes built-in with the latest versions. You can verify your current version using pip show selenium and upgrade to the newest version with pip install --upgrade selenium.

Awesome! You're now ready to set up your Selenium proxy in Python using the Chrome driver.

To use Selenium proxy, you need to:

Retrieve a valid proxy server.
Specify it in the --proxy-server Chrome option.
Visit your target page.

Let's go over the whole process step-by-step.

First, get a free proxy address from the Free Proxy List website. Configure Selenium with Options to launch Chrome using a proxy. Then, print the body content of the target webpage.

                    scraper.py
                
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# define the proxy address and port
proxy = "20.235.159.154:80"

# set Chrome options to run in headless mode using a proxy
options = Options()
options.add_argument("--headless=new")
options.add_argument(f"--proxy-server={proxy}")

# initialize Chrome driver
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

  
  

  
Copied!

The controlled instance of Chrome will now perform all requests through the specified proxy.

Here's what it'll return:

                    Output
                
{
  "origin": "20.235.159.154:80"
}

Copied!

The site response matches the proxy server address. That means Selenium is visiting pages through the proxy server.

Note

Free proxies are short-lived and unreliable, so the one used in the snippet above won't work at the time of reading. We'll see a better alternative later in the tutorial.

Great! You now know the basics of using a Python Selenium proxy.

However, using a single proxy isn't enough. For instance, some websites implement rate limiting, which restricts the number of requests you can make from a single IP within a given time frame. They can also block you if you make several requests within a short timeframe.

To avoid these limitations and reduce the risk of being blocked, you need to implement advanced strategies like proxy rotation and premium proxies. We'll cover these methods later in the tutorial.

Selenium Proxy Authentication

Some proxy servers rely on authentication to restrict access to users without valid credentials. That's usually the case with commercial solutions or premium proxies.

The Selenium syntax to specify a username and password in an authenticated proxy URL looks like this:

                    scraper.py
                
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

Copied!

However, using a URL in --proxy-server won't work because the Chrome driver ignores the username and password by default. That's where a third-party plugin, such as Selenium Wire, comes to the rescue.

Selenium Wire extends Selenium to give you access to the requests made by the browser and change them as desired. Run the command below to install it:

                    Terminal
                
pip install blinker==1.7.0 selenium-wire

Copied!

Note

Selenium Wire is no longer maintained, and the library has a dependency on blinker==1.7.0. To ensure that you can run Selenium Wire smoothly, you need to install it with the fixed blinker dependency.

Use Selenium Wire for proxy authentication, as shown below:

                    scraper.py
                
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# configure the proxy
proxy_username = "<YOUR_USERNAME>"
proxy_password = "<YOUR_PASSWORD>"
proxy_address = "20.235.159.154"
proxy_port = "80"

# formulate the proxy url with authentication
proxy_url = f"http://{proxy_username}:{proxy_password}@{proxy_address}:{proxy_port}"

# set selenium-wire options to use the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy_url,
        "https": proxy_url
    },
}

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize the Chrome driver with service, selenium-wire options, and chrome options
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    seleniumwire_options=seleniumwire_options,
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

  
  

  
Copied!

Note

This code may result in a ]407: Proxy Authentication Required](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/407) error. A proxy server responds with that HTTP status when the credentials aren't correct, so make sure the proxy URL uses a valid username and password.

Learn more in our guide to Selenium Wire.

Premium residential proxies to avoid getting blocked.

Access all the data you need with ZenRows' residential proxy network.

Try for Free

Best Protocols for a Proxy in Selenium

When it comes to choosing a protocol for a Selenium proxy, the most common options are HTTP, HTTPS, and SOCKS5.

HTTP proxies send data over the internet, while HTTPS proxies encrypt it to provide an extra security layer. That's why the latter is more popular and secure.

Another useful protocol for Selenium proxies is SOCKS5, also known as SOCKS. It supports a wider range of web traffic, including email and FTP, which makes it a more versatile protocol.

Overall, HTTP and HTTPS proxies are good for web scraping and crawling, and SOCKS finds applications in tasks that involve non-HTTP traffic.

Use a Rotating Proxy in Selenium With Python

If your script makes several requests in a short interval, the server may consider it suspicious and block your IP. Websites can detect and block requests from specific IP addresses, making it difficult for you to scrape data effectively.

However, using a rotating proxy approach can solve this problem. By switching proxies in Selenium after a particular period or number of requests, your end IP will keep changing. This makes you appear as a different user each time, preventing the server from banning you.

Let's learn how to build a proxy rotator in Selenium with selenium-wire.

First, you need to create a pool of proxies. In this example, we'll use some free proxies.

Store them in an array as follows:

                    scraper.py
                
PROXIES = [
    "http://19.151.94.248:88",
    "http://149.169.197.151:80",
    # ...
    "http://212.76.118.242:97"
]

Copied!

Then, extract a random proxy with random.choice() and use it to initialize a new driver instance. Here's what your final code should look like:

                    scraper.py
                
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import random

# the list of proxy to rotate on 
PROXIES = [
    "http://20.235.159.154:80",
    "http://149.169.197.151:80",
    # ...
    "http://212.76.118.242:97"
]

# randomly select a proxy
proxy = random.choice(PROXIES)

# set selenium-wire options to use the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy,
        "https": proxy
    },
}

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize the Chrome driver with service, selenium-wire options, and chrome options
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    seleniumwire_options=seleniumwire_options,
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

  
  

  
Copied!

The following is the output for manually running this code three times:

                    Output
                
# request 1
{
    "origin": "149.169.197.151:1286"
}

# request 2
{
    "origin": "20.235.159.154:3224"
}

# request 3
{
    "origin": "212.76.118.242:97"
}

  
  

  
Copied!

Well done! You’ve just built a working Selenium proxy rotator. You can learn more tips and trick in our definitive guide on how to rotate proxies in Python.

However, most requests will fail since free proxies are error-prone. That's why you should add retry logic with random timeouts.

But that's not the only issue. Try to test the IP rotator logic against G2 Reviews, a website protected by anti-bot technologies:

                    scraper.py
                
driver.get("https://www.g2.com/products/asana/reviews")

Copied!

You'll get the following output:

                    Output
                
<!DOCTYPE html>
<html class="no-js" lang="en-US">
<head>
  <title>Attention Required! | Cloudflare</title>
</head>
<body>
    
    <!-- ... -->

      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
      </div>

    <!-- ... -->
    
</body>
</html>

  
  

  
Copied!

The target server detected the rotating proxy Selenium request as a bot and responded with a 403 Unauthorized error.

In fact, free proxies will usually get you blocked. We used them to demonstrate the basics, but you should never rely on them in a real-world project.

The solution? A premium proxy!

Add Premium Proxies to Selenium

As seen above, free proxies are unreliable, and you should prefer premium proxies for web scraping. If you need ideas on where to get them, check our list of the best proxy providers for scraping.

Premium proxies offer seamless anti-bot bypassing with automated residential IP rotation and geolocation capabilities. This allows you to scrape data efficiently without the risk of being rate-limited or blocked, all while maintaining anonymity.

Let's see how to add auto-rotating premium proxies using ZenRows’ proxy service and access the G2 Reviews page that blocked us in the previous section.

Sign up to get started with ZenRows. Once you register, you'll get redirected to the Request Builder page. Paste your target URL, click on the Premium Proxies checkbox, and select the JS Rendering boost mode. Select Python as the language, and click on the Proxy tab. Finally, copy the generated code.

building a scraper with zenrows — Click to open the image in full screen

Now, install the requests library:

                    Terminal
                
pip install requests

Copied!

Then, paste the generated Python code into your script:

                    scraper.py
                
# pip install requests
import requests

url = "https://www.g2.com/products/asana/reviews"
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001"
proxies = {"http": proxy, "https": proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)

Copied!

Run it, and you'll get the target page's HTML content:

                    Output
                
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>

  
  

  
Copied!

Fantastic! You successfully accessed a protected website using ZenRows premium proxies. Now, you have a proxy scraping solution with Selenium's capabilities.

However, premium proxies aren’t a foolproof solution. If you're looking for a complete anti-bot bypass toolkit, you should use a web-scraping API, such as ZenRows. It includes premium proxies and other essential features like a built-in headless browser, request header management, TLS fingerprints, and more.

Error 403: Forbidden for Proxy in Selenium Grid

Selenium Grid allows you to control remote browsers and run cross-platform scripts in parallel. However, using it may lead to getting an Error 403: Forbidden for Proxy, one of the most common errors you can encounter during web scraping. That happens for two reasons:

Another process is already running on port 4444.
You aren't sending RemoteWebDriver requests to the correct URL.

By default, the Selenium server hub listens on http://localhost:4444. If another process is running on the 4444 port, end it or start Selenium Grid using another port.

If that doesn't solve the issue, make sure you're connecting the remote driver to the right hub URL, as shown below:

                    scraper.py
                
import selenium.webdriver as webdriver
# ...
webdriver.Remote('http://localhost:4444/wd/hub', {})

Copied!

Perfect! The error should be gone now!

Conclusion

This step-by-step tutorial showed how to set up a proxy in Selenium with Python. You’ve started with the basics of adding a proxy for Selenium web scraping and then moved on to more advanced topics, such as rotating proxies or using premium proxies.

Now you know:

What a Selenium proxy is.
The basics of setting a proxy with Selenium in Python.
How to deal with authenticated proxies in Selenium.
How to implement a rotating proxy and why this approach doesn't work with free proxies.
What a premium proxy is and how to use it.

While proxies are one of the ways to avoid anti-bot detection systems, they don’t work 100% of the time, and require a lot of manual maintenance. To avoid the hassle of finding and configuring proxies and confidently bypass any anti-bot measures, use a web scraping API, such as ZenRows. Try ZenRows for free!

What Is a Selenium Proxy?

How to Set Up a Proxy in Selenium

Selenium Proxy Authentication

Best Protocols for a Proxy in Selenium

Use a Rotating Proxy in Selenium With Python

Add Premium Proxies to Selenium

Error 403: Forbidden for Proxy in Selenium Grid

Conclusion

Ready to get started?