How to Set Selenium Headers: Step-by-Step Tutorial

May 2, 2024 ยท 7 min read

Are you scraping with Selenium and want to customize your request headers? We've got you covered!

This article teaches you how to customize the request headers while scraping with Selenium in Python. You'll also learn how to manage the headers at scale to avoid getting blocked.

Why Are Selenium Headers Important?

HTTP Headers provide information about the request source (client) and how the recipient (server) handles the client's request. There are two types of HTTP headers: request and response headers.

Request headers are the most relevant to web scraping because they describe your HTTP client during content extraction.

The common ways to leverage the request headers during web scraping include avoiding anti-bot detection and scraping behind a login using cookie session management. For instance, you can change the Selenium user agent to mimic a regular browser in headless mode.

Although you may not set full request headers, the referrer, cookies, user agent, encoding, language type, and content format are examples of common headers for web scraping. An automation tool like Selenium often misconfigures or omits some of these headers.

For instance, the default request headers in Selenium look like the following in headless mode:

Example
"headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Sec-Ch-Ua": [
      "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"HeadlessChrome\";v=\"122\""
    ],
    "Sec-Ch-Ua-Mobile": [
      "?0"
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Dest": [
      "document"
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "none"
    ],
    "Sec-Fetch-User": [
      "?1"
    ],
    "Upgrade-Insecure-Requests": [
      "1"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/122.0.6261.95 Safari/537.36"
    ]
  }
}

Feel free to view yours by sending a request to https://httpbin.io/headers with the following code:

scraper.py
# import the required library
from selenium import webdriver

# create a webdriver option
chrome_options = webdriver.ChromeOptions()

# run the browser in headless mode
chrome_options.add_argument("--headless")

# start a Chrome instance
driver = webdriver.Chrome(options=chrome_options)

# open the target web page
driver.get("https://httpbin.io/headers")

# print the page source to view your request headers
print(driver.page_source)

Opening the same website (https://httpbin.io/headers) via a regular browser like Chrome shows that some information is missing or misconfigured in Selenium's default headers.

Notably, the accepted language and Referer headers are missing in Selenium. The Sec-Fetch-Site header value also says "none". You'll have to replace that with "cross-site" since you'll add a Referer source.

The accepted encoding also omits the zstd compression, while the user agent and Sec-Ch-Ua headers show headless Chrome, indicating that your request is from automated software.

See the result for Chrome below to compare the deviations:

Example
{
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://www.google.com/"
    ],
    "Sec-Ch-Ua": [
      "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Google Chrome\";v=\"122\""
    ],
    "Sec-Ch-Ua-Mobile": [
      "?0"
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Dest": [
      "document"
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "cross-site"
    ],
    "Upgrade-Insecure-Requests": [
      "1"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
    ]
  }
}

These omissions in Selenium's default headers are enough to flag your web scraper as a bot.

To fix the headers and reduce the likelihood of anti-bot detection, you'll need to add the missing ones and edit the misconfigured values. You'll see how to achieve that in the next section.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Set Up Custom Headers With Selenium Wire

There is no straightforward method for setting custom headers in Selenium. However, you can still achieve this with a Selenium binding called Selenium Wire.

Selenium Wire includes all the functionalities of the standard Selenium WebDriver but extends it with extra features. In this section, you'll use Selenium Wire to customize the request headers.

First, install selenium-wire using pip:

Terminal
pip install selenium-wire

Next, you'll customize the request headers with the missing ones identified earlier and edit the misconfigured ones.

To begin, import Selenium Wire into your script and define a function to add the missing headers (accepted language and Referer):

scraper.py
# import the required library
from seleniumwire import webdriver

# define the request interceptor to configure custom headers
def interceptor(request):

    # add the missing headers
    request.headers["Accept-Language"] = "en-US,en;q=0.9"
    request.headers["Referer"] = "https://www.google.com/"

The next step is to extend that function to edit the misconfigured headers. The headers to edit include the accepted encoding, user agent, Sec-Ch-Ua, and the Sec-Fetch-Site.

To edit the misconfigured headers, start by deleting them from Selenium's default ones:

scraper.py
# define the request interceptor to configure custom headers
def interceptor(request):
  
    # ...
    
    # delete the existing misconfigured default headers values
    del request.headers["User-Agent"]
    del request.headers["Sec-Ch-Ua"]
    del request.headers["Sec-Fetch-Site"]
    del request.headers["Accept-Encoding"]

Then, replace the deleted headers with the correct values:

scraper.py
# define the request interceptor to configure custom headers
def interceptor(request):
  
    # ...     
    
    # replace the deleted headers with edited values
    request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
    request.headers["Sec-Ch-Ua"] = "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Google Chrome\";v=\"122\""
    request.headers["Sec-Fetch-Site"] = "cross-site"
    request.headers["Accept-Encoding"] = "gzip, deflate, br, zstd"

Next, start a browser instance in headless mode and point the driver to the interceptor function. This tells the driver to use the instructions in that function:

scraper.py
# ...

# create a webdriver option
chrome_options = webdriver.ChromeOptions()

# run the browser in headless mode
chrome_options.add_argument("--headless")

# start a Chrome instance
driver = webdriver.Chrome(options=chrome_options)

# add the interceptor
driver.request_interceptor = interceptor

Now, visit the target website and print the page source to view the request headers:

scraper.py
# ...

# open the target web page
driver.get("https://httpbin.io/headers")

# print the page source to view your request headers
print(driver.page_source)

# quit the driver
driver.quit()

Here's the complete code:

scraper.py
# import the required library
from seleniumwire import webdriver

# define the request interceptor to configure custom headers
def interceptor(request):

    # add the missing headers
    request.headers["Accept-Language"] = "en-US,en;q=0.9"
    request.headers["Referer"] = "https://www.google.com/"

    # delete the existing misconfigured default headers values
    del request.headers["User-Agent"]
    del request.headers["Sec-Ch-Ua"]
    del request.headers["Sec-Fetch-Site"]
    del request.headers["Accept-Encoding"]
    
    # replace the deleted headers with edited values
    request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
    request.headers["Sec-Ch-Ua"] = "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Google Chrome\";v=\"122\""
    request.headers["Sec-Fetch-Site"] = "cross-site"
    request.headers["Accept-Encoding"] = "gzip, deflate, br, zstd"

# create a webdriver option
chrome_options = webdriver.ChromeOptions()

# run the browser in headless mode
chrome_options.add_argument("--headless")

# start a Chrome instance
driver = webdriver.Chrome(options=chrome_options)

# add the interceptor
driver.request_interceptor = interceptor

# open the target web page
driver.get("https://httpbin.io/headers")

# print the page source to view your request headers
print(driver.page_source)

# quit the driver
driver.quit()

The code adds the missing headers and edits the defective ones, as shown below. We've highlighted the affected headers for clarity:

Output
{
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://www.google.com/"
    ],
    "Sec-Ch-Ua": [
      "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Google Chrome\";v=\"122\""
    ],
    "Sec-Ch-Ua-Mobile": [
      "?0"
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Dest": [
      "document"
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "cross-site"
    ],
    "Sec-Fetch-User": [
      "?1"
    ],
    "Upgrade-Insecure-Requests": [
      "1"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
    ]
  }
}

Nice job! You just set up custom request headers for your Selenium scraper. However, you need more than that to bypass advanced anti-bot systems.

Considerations for Headers in Selenium Scraping

Selenium is a powerful web scraping library. But, as you've seen, it doesn't handle request header customization directly but relies on an external plugin.

Besides, maintaining the request headers can be challenging. So, anti-bot protection can block you even after setting the full request headers.

For example, a protected web page like G2 review will block Selenium despite including the required header set.

Try it out with the following code that takes a screenshot of the page and extracts its HTML:

scraper.py
# import the required library
from seleniumwire import webdriver

# define the request interceptor to configure custom headers
def interceptor(request):

    # add the missing headers
    request.headers["Accept-Language"] = "en-US,en;q=0.9"
    request.headers["Referer"] = "https://www.google.com/"

    # delete the existing misconfigured default headers values
    del request.headers["User-Agent"]
    del request.headers["Sec-Ch-Ua"]
    del request.headers["Sec-Fetch-Site"]
    del request.headers["Accept-Encoding"]
    
    # replace the deleted headers with edited values
    request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
    request.headers["Sec-Ch-Ua"] = "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Google Chrome\";v=\"122\""
    request.headers["Sec-Fetch-Site"] = "cross-site"
    request.headers["Accept-Encoding"] = "gzip, deflate, br, zstd"

# create a webdriver option
chrome_options = webdriver.ChromeOptions()

# run the browser in headless mode
chrome_options.add_argument("--headless")

# start a Chrome instance
driver = webdriver.Chrome(options=chrome_options)

# add the interceptor
driver.request_interceptor = interceptor

# open the target web page
driver.get("https://www.g2.com/products/asana/reviews")

# extract the page's HTML
print(driver.page_source)

# take a screenshot and save it as 'screenshot.png'
driver.save_screenshot("g2-reviews-page.png")

# quit the driver
driver.quit()

Selenium got blocked by Cloudflare Turnstile, as shown in the following HTML:

Output
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
    <!--  ...    -->
  
    <title>Attention Required! | Cloudflare</title>
</head>

Here's the screenshot of the anti-bot protection message:

G2 Page Blocked
Click to open the image in full screen

How can you bypass this block and scrape all you want?

We recommend using a web scraping API like ZenRows. It helps you manage your request headers, configure premium proxies, and bypass all forms of CAPTCHAs and advanced anti-bot systems.

Want to try it out? Let's scrape the G2 website that got you blocked earlier with ZenRows.

Sign up to launch the Request Builder. Paste the target URL in the link box, set the Boost mode to JS Rendering, and activate Premium Proxies. Select Python as your programming language. Then, copy and paste the generated code into your script.

ZenRows Request Builder Page
Click to open the image in full screen

Your scraper now uses ZenRows and the Requests library as the HTTP client. So, ensure you install Requests with pip:

Terminal
pip install requests

The generated code should look like this in your script:

scraper.py
# pip install requests
import requests

params = {
	"url": "https://www.g2.com/products/asana/reviews",
	"apikey": "<YOUR_ZENROWS_API_KEY>",
	"js_render": "true",
	"premium_proxy": "true"
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

That code scrapes the protected website successfully, showing the correct page title:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews 2024</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>

Congratulations! You just scraped a protected website using ZenRows and are now ready to extract content at scale without limitations.

Let's see how ZenRows optimizes the request headers with the relevant set. Simply replace the target URL in the generated code with `https://httpbin.io/headers`.

Here's the result, including the most relevant headers:

Output
{
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://httpbin.io/headers"
    ],
    "Sec-Ch-Ua": [
      "\"Chromium\";v=\"118\", \"Google Chrome\";v=\"122\", \"Not=A?Brand\";v=\"99\""
    ],
    "Sec-Ch-Ua-Mobile": [
      "?0"
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Dest": [
      "document"
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "none"
    ],
    "Sec-Fetch-User": [
      "?1"
    ],
    "Upgrade-Insecure-Requests": [
      "1"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
    ]
  }
}

That's it! ZenRows now manages your request headers per website requirements.

ZenRows also acts as a headless browser with its JavaScript instructions. This means you can completely replace Selenium with ZenRows, scrape any dynamic website at scale, and forget about complex headers configuration.

Conclusion

In this tutorial, you've seen how to set custom request headers in Selenium with Python. Here's a quick recap of all you've learned:

  • The importance of request headers while scraping with Selenium.
  • How to add new custom headers and edit existing ones using Selenium Wire.
  • Managing the request headers at scale with a web scraping solution.

Remember that anti-bots lurk around to detect and block your scraper regardless of your request header settings. Bypass them all with ZenRows and scrape any website without getting blocked. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you