How to Use a Proxy With Selenium in Python (2024)

June 27, 2024 · 12 min read

Have you been detected as a bot while web scraping with Selenium?

No wonder. Selenium is an excellent tool for scraping dynamic websites, but it can’t bypass complex anti-bot systems on its own. To prevent IP blocks, bypass geolocation restrictions, and manage rate limits, you can add a proxy to your Selenium scraper.

In this article, you’ll learn how to do it. Here’s what we’ll cover:

  • How to set up a proxy in Selenium?
  • How to rotate proxies in Selenium?
  • How to use premium proxies?

Let's dive in!

What Is a Selenium Proxy?

A proxy acts as an intermediary between a client and a server. Through it, the client makes requests to other servers anonymously and securely and avoids geographical restrictions.

Headless browsers can be configured to use proxy servers like HTTP clients. A proxy helps protect your IP address and avoid blocks when scraping protected websites, like Amazon, with Selenium.

Proxy-powered Selenium is particularly useful for browser automation activities such as testing and web scraping. Keep reading to learn how to set up a proxy in Selenium for web scraping!

How to Set Up a Proxy in Selenium

In this section, you'll learn how to set up a Selenium proxy using Python. We'll use Chrome, as it's the most popular browser for automation.

If you prefer using another programming language, check out the following tutorials:

Let's start by setting up a basic Python script to control Chrome with Selenium.

The snippet below initializes a headless Chrome driver and visits httpbin, a webpage that returns the IP address of the client making the request. Finally, the script prints the response HTML.

scraper.py
# pip install selenium webdriver-manager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize Chrome driver
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()), 
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the HTML of the target webpage
print(driver.page_source)

# release the resources and close the browser
driver.quit()

The code will print the following HTML:

Output
<html><head><meta name="color-scheme" content="light dark"><meta charset="utf-8"></head><body><pre>{
  "origin": "50.217.226.40:80"
}
</pre><div class="json-formatter-container"></div></body></html>

Awesome! You're now ready to set up your Selenium proxy in Python using the Chrome driver.

To set a proxy in Selenium, you need to:

  1. Retrieve a valid proxy server.
  2. Specify it in the --proxy-server Chrome option.
  3. Visit your target page.

Let's go over the whole process step-by-step.

First, get a free proxy address from the Free Proxy List website. Configure Selenium with Options to launch Chrome using a proxy. Then, print the body content of the target webpage.

scraper.py
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# define the proxy address and port
proxy = "20.235.159.154:80"

# set Chrome options to run in headless mode using a proxy
options = Options()
options.add_argument("--headless=new")
options.add_argument(f"--proxy-server={proxy}")

# initialize Chrome driver
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

The controlled instance of Chrome will now perform all requests through the specified proxy.

Here's what it'll return:

Output
{
  "origin": "20.235.159.154:80"
}

The site response matches the proxy server address. That means Selenium is visiting pages through the proxy server.

Great! You now know the basics of using a Python Selenium proxy.

However, using a single proxy isn't enough. For instance, some websites implement rate limiting, which restricts the number of requests you can make from a single IP within a given time frame. They can also block you if you make several requests within a short timeframe.

To avoid these limitations and reduce the risk of being blocked, you need to implement advanced strategies like proxy rotation and premium proxies. We'll cover these methods later in the tutorial.

Proxy Authentication in Selenium

Some proxy servers rely on authentication to restrict access to users without valid credentials. That's usually the case with commercial solutions or premium proxies.

The Selenium syntax to specify a username and password in an authenticated proxy URL looks like this:

scraper.py
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

However, using a URL in --proxy-server won't work because the Chrome driver ignores the username and password by default. That's where a third-party plugin, such as Selenium Wire, comes to the rescue.

Selenium Wire extends Selenium to give you access to the requests made by the browser and change them as desired. Run the command below to install it:

Terminal
pip install blinker==1.7.0 selenium-wire

Use Selenium Wire to deal with proxy authentication, as shown below:

scraper.py
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# configure the proxy
proxy_username = "<YOUR_USERNAME>"
proxy_password = "<YOUR_PASSWORD>"
proxy_address = "20.235.159.154"
proxy_port = "80"

# formulate the proxy url with authentication
proxy_url = f"http://{proxy_username}:{proxy_password}@{proxy_address}:{proxy_port}"

# set selenium-wire options to use the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy_url,
        "https": proxy_url
    },
}

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize the Chrome driver with service, selenium-wire options, and chrome options
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    seleniumwire_options=seleniumwire_options,
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Best Protocols for a Proxy in Selenium

When it comes to choosing a protocol for a Selenium proxy, the most common options are HTTP, HTTPS, and SOCKS5.

HTTP proxies send data over the internet, while HTTPS proxies encrypt it to provide an extra security layer. That's why the latter is more popular and secure.

Another useful protocol for Selenium proxies is SOCKS5, also known as SOCKS. It supports a wider range of web traffic, including email and FTP, which makes it a more versatile protocol.

Overall, HTTP and HTTPS proxies are good for web scraping and crawling, and SOCKS finds applications in tasks that involve non-HTTP traffic.

Use a Rotating Proxy in Selenium With Python

If your script makes several requests in a short interval, the server may consider it suspicious and block your IP. Websites can detect and block requests from specific IP addresses, making it difficult for you to scrape data effectively.

However, using a rotating proxy approach can solve this problem. By switching proxies after a particular period or number of requests, your end IP will keep changing. This makes you appear as a different user each time, preventing the server from banning you.

Let's learn how to build a proxy rotator in Selenium with selenium-wire.

First, you need to create a pool of proxies. In this example, we'll use some free proxies.

Store them in an array as follows:

scraper.py
PROXIES = [
    "http://19.151.94.248:88",
    "http://149.169.197.151:80",
    # ...
    "http://212.76.118.242:97"
]

Then, extract a random proxy with random.choice() and use it to initialize a new driver instance. Here's what your final code should look like:

scraper.py
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import random

# the list of proxy to rotate on 
PROXIES = [
    "http://20.235.159.154:80",
    "http://149.169.197.151:80",
    # ...
    "http://212.76.118.242:97"
]

# randomly select a proxy
proxy = random.choice(PROXIES)

# set selenium-wire options to use the proxy
seleniumwire_options = {
    "proxy": {
        "http": proxy,
        "https": proxy
    },
}

# set Chrome options to run in headless mode
options = Options()
options.add_argument("--headless=new")

# initialize the Chrome driver with service, selenium-wire options, and chrome options
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    seleniumwire_options=seleniumwire_options,
    options=options
)

# navigate to the target webpage
driver.get("https://httpbin.io/ip")

# print the body content of the target webpage
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources and close the browser
driver.quit()

The following is the output for manually running this code three times:

Output
# request 1
{
    "origin": "149.169.197.151:1286"
}

# request 2
{
    "origin": "20.235.159.154:3224"
}

# request 3
{
    "origin": "212.76.118.242:97"
}

Well done! You’ve just built a working Selenium proxy rotator. You can learn more tips and trick in our definitive guide on how to rotate proxies in Python.

However, most requests will fail since free proxies are error-prone. That's why you should add retry logic with random timeouts.

But that's not the only issue. Try to test the IP rotator logic against G2 Reviews, a website protected by anti-bot technologies:

scraper.py
driver.get("https://www.g2.com/products/asana/reviews")

You'll get the following output:

Output
<!DOCTYPE html>
<html class="no-js" lang="en-US">
<head>
  <title>Attention Required! | Cloudflare</title>
</head>
<body>
    
    <!-- ... -->

      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
      </div>

    <!-- ... -->
    
</body>
</html>

The target server detected the rotating proxy Selenium request as a bot and responded with a 403 Unauthorized error.

In fact, free proxies will usually get you blocked. We used them to demonstrate the basics, but you should never rely on them in a real-world project.

The solution? A premium proxy!

Add Premium Proxies to Selenium

As seen above, free proxies are unreliable, and you should prefer premium proxies for web scraping. If you need ideas on where to get them, check our list of the best proxy providers for scraping.

Premium proxies offer seamless anti-bot bypassing with automated residential IP rotation and geolocation capabilities. This allows you to scrape data efficiently without the risk of being rate-limited or blocked, all while maintaining anonymity.

Let's see how to add auto-rotating premium proxies using ZenRows’ proxy service and access the G2 Reviews page that blocked us in the previous section.

Sign up to get started with ZenRows. Once you register, you'll get redirected to the Request Builder page. Paste your target URL, click on the Premium Proxies checkbox, and select the JS Rendering boost mode. Select Python as the language, and click on the Proxy tab. Finally, copy the generated code.

building a scraper with zenrows
Click to open the image in full screen

Now, install the requests library:

Terminal
pip install requests

Then, paste the generated Python code into your script:

scraper.py
# pip install requests
import requests

url = "https://www.g2.com/products/asana/reviews"
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001"
proxies = {"http": proxy, "https": proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)

Run it, and you'll get the target page's HTML content:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>

Fantastic! You successfully accessed a protected website using ZenRows premium proxies. Now, you have a proxy scraping solution with Selenium's capabilities.

However, premium proxies aren’t a foolproof solution. If you're looking for a complete anti-bot bypass toolkit, you should use a web-scraping API, such as ZenRows. It includes premium proxies and other essential features like a built-in headless browser, request header management, TLS fingerprints, and more.

Error 403: Forbidden for Proxy in Selenium Grid

Selenium Grid allows you to control remote browsers and run cross-platform scripts in parallel. However, using it may lead to getting an Error 403: Forbidden for Proxy, one of the most common errors you can encounter during web scraping. That happens for two reasons:

  1. Another process is already running on port 4444.
  2. You aren't sending RemoteWebDriver requests to the correct URL.

By default, the Selenium server hub listens on http://localhost:4444. If another process is running on the 4444 port, end it or start Selenium Grid using another port.

If that doesn't solve the issue, make sure you're connecting the remote driver to the right hub URL, as shown below:

scraper.py
import selenium.webdriver as webdriver
# ...
webdriver.Remote('http://localhost:4444/wd/hub', {})

Perfect! The error should be gone now!

Conclusion

This step-by-step tutorial showed how to set up a proxy in Selenium with Python. You’ve started with the basics of adding a proxy to Selenium and then moved on to more advanced topics, such as rotating proxies or using premium proxies.

Now you know:

  • What a Selenium proxy is.
  • The basics of setting a proxy with Selenium in Python.
  • How to deal with authenticated proxies in Selenium.
  • How to implement a rotating proxy and why this approach doesn't work with free proxies.
  • What a premium proxy is and how to use it.

While proxies are one of the ways to avoid anti-bot detection systems, they don’t work 100% of the time, and require a lot of manual maintenance. To avoid the hassle of finding and configuring proxies and confidently bypass any anti-bot measures, use a web scraping API, such as ZenRows. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you