How to Set Urllib Headers: Tutorial [2026]

May 29, 2024 · 9 min read

Table of contents

Why are headers important for urllib?
How to set up custom headers with urllib
- Add headers
- Edit a header's values
- Set the order
Conclusion

Do you want to customize your urllib request headers while scraping with Python? You’ve come to the right place!

In this article, you'll learn how to configure custom request headers in urllib.

Add headers with urllib.
Edit a header's value.
Set the order of the headers.

Why Are Headers Important for Urllib?

Headers describe the request source (client) and how the recipient (server) handles the response during an HTTP interaction. The client is usually a browser or an HTTP client like Python's Requests and urllib.

There are two types of HTTP headers: the request and response headers. However, the request headers are the most important during web scraping because they convey information about the HTTP client.

Customizing the request headers while scraping with urllib lets you mimic a legitimate browser and avoid anti-bot detection. Another use case of header customization is session cookies manipulation for authenticating your scraper to extract data behind a login.

Let’s check the default urllib headers by requesting https://httpbin.io/headers , a web page that returns your current HTTP request headers:

                    scraper.py
                
# import the required libraries
from urllib import request, error

# catch HTTP errors
try:

    # send the request and obtain a response object
    response = request.urlopen("https://httpbin.io/headers").read()

    # print a decoded format to view your request headers
    print(response.decode("utf-8"))

except error.HTTPError as e:
    print(e)

Copied!

The urllib library sends the following incomplete request headers, making your scraper vulnerable to anti-bot detection:

                    Output
                
{
  "headers": {
    "Accept-Encoding": [
      "identity"
    ],
    "Connection": [
      "close"
    ],
    "Host": [
      "httpbin.io"
    ],
    "User-Agent": [
      "Python-urllib/3.12"
    ]
  }
}

  
  

  
Copied!

Open the same URL (https://httpbin.io/headers) on a legitimate browser like Chrome. You'll see detailed request headers like so:

                    Output
                
{
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://www.google.com/"
    ],
    "Sec-Ch-Ua": [
      "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\""
    ],
    "Sec-Ch-Ua-Mobile": [
      "?0"
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Dest": [
      "document"
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "cross-site"
    ],
    "Upgrade-Insecure-Requests": [
      "1"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    ]
  }
}

  
  

  
Copied!

Compare this result with urllib's request headers, and you'll see that urllib is missing important HTTP request headers.

First, urllib's user agent header describes your scraper as a Python bot, while the accepted encoding string is inaccurate. The referer, response type, platform, secured client hint user agent (Sec-Ch-Ua), and accepted language headers are also missing.

These missing headers make your urllib web scraper vulnerable to anti-bot detection. You'll learn how to fix them in the next section.

Frustrated that your web scrapers are blocked once and again?

ZenRows API handles rotating proxies and headless browsers for you.

Try for FREE

How to Set Up Custom Headers With Urllib

As you've seen, some urllib request headers are missing while others are inaccurate. In this section, you'll learn the three strategies of setting custom request headers while scraping with urllib. In each case, you'll request https://httpbin.io/headers to view your current request headers.

Add Headers With Urllib

Setting custom request headers in urllib requires specifying the header strings in the request parameter. Let's add the missing headers and edit the misconfigured ones to see how it works.

First, define the new headers in a dictionary. Note that the Sec-Fetch-Mode and Sec-Fetch-Site headers added below are essential backups to the referer. They inform the server that the client is navigating from another website (Google):

                    scraper.py
                
# import the required libraries
from urllib import request, error

# define new request headers
missing_headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "Referer":"https://www.google.com/",
    "Sec-Ch-Ua-Platform": "\"Windows\"",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

  
  

  
Copied!

Add the new headers as request parameters and open the target website to view your HTTP request headers:

                    scraper.py
                
# ...

# catch HTTP errors
try:

    # create a request param and add missing request headers
    request_params = request.Request(
        url="https://httpbin.io/headers", 
        headers=missing_headers
    )    
    
    # send the request with the parameters and obtain a response object
    response = request.urlopen(request_params).read()

    # print a decoded format to view your request headers
    print(response.decode("utf-8"))

except error.HTTPError as e:
    print(e)

  
  

  
Copied!

                    scraper.py
                
# import the required libraries
from urllib import request, error

# define new request headers
missing_headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "Referer":"https://www.google.com/",
    "Sec-Ch-Ua-Platform": "\"Windows\"",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

# catch HTTP errors
try:

    # create a request param and add request headers
    request_params = request.Request(
        url="https://httpbin.io/headers", 
        headers=missing_headers
    )    

    # send the request with the parameters and obtain a response object
    response = request.urlopen(request_params).read()

    # print a decoded format to view your request headers
    print(response.decode("utf-8"))

except error.HTTPError as e:
    print(e)

  
  

  
Copied!

The code replaces the existing request headers and adds new ones, as shown:

                    Output
                
{
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "close"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://www.google.com/"
    ],
    "Sec-Ch-Ua": [
      "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\""
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "cross-site"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    ]
  }
}

  
  

  
Copied!

Your scraper now uses the new headers to navigate your target web page. In the next section, we will see how to edit some of these headers.

Edit a Header's Values

Editing a header in urllib involves intercepting an ongoing request and updating its header values on the fly. It's handy if you want to use different header values for several web pages.

Assume you're scraping two web pages and want to use a Windows user agent for the first and a Linux for the second. Let's request the target website twice and print its header values to see how to achieve that.

For the first request, you'll retain the previous headers dictionary containing a Windows user agent:

                    scraper.py
                
# import the required libraries
from urllib import request, error

# define new request headers
missing_headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "Referer":"https://www.google.com/",
    "Sec-Ch-Ua-Platform": "\"Windows\"",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

# catch HTTP errors
try:

    # create a request param and add request headers
    request_params = request.Request(
        url="https://httpbin.io/headers", 
        headers=missing_headers
    )
    
    # send the first request with the existing Windows user agent
    response_1 = request.urlopen(request_params).read()

    # print a decoded format to view the first request headers
    print(f"First header used: {response_1.decode("utf-8")}")
    
except error.HTTPError as e:
    print(e)

  
  

  
Copied!

Next, intercept the request to edit the existing user agent string to Linux. You'll also need to change the client hint platform header to Linux since you're now using a Linux user agent:

                    scraper.py
                
# ...

# catch HTTP errors
try:
  
    # ...
    
    # use a Linux user agent and platform for the second request
    request_params.add_header("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36")
    request_params.add_header("Sec-Ch-Ua-Platform", "\"Linux\"")

    # send the second request with the edited headers
    response_2 = request.urlopen(request_params).read()

    # print a decoded format to view your request headers
    print(f"Second header used: {response_2.decode("utf-8")}")

Copied!

Combine the two snippets, and you'll get the following complete code:

                    scraper.py
                
# import the required libraries
from urllib import request, error

# define new request headers
missing_headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "Referer":"https://www.google.com/",
    "Sec-Ch-Ua-Platform": "\"Windows\"",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

# catch HTTP errors
try:

    # create a request param and add request headers
    request_params = request.Request(
        url="https://httpbin.io/headers", 
        headers=missing_headers
    )   

    # send the first request with the existing Windows user agent
    response_1 = request.urlopen(request_params).read()

    # print a decoded format to view the first request headers
    print(f"First header used: {response_1.decode("utf-8")}")

    # use a Linux user agent and platform for the second request
    request_params.add_header("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36")
    request_params.add_header("Sec-Ch-Ua-Platform", "\"Linux\"")

    # send the second request with the edited headers
    response_2 = request.urlopen(request_params).read()

    # print a decoded format to view your request headers
    print(f"Second header used: {response_2.decode("utf-8")}")

except error.HTTPError as e:
    print(e)

  
  

  
Copied!

The code outputs the headers for both requests. Pay attention to the difference in the user agent and client hint platform headers in both outputs:

                    Output
                
First header used: {
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "close"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://www.google.com/"
    ],
    "Sec-Ch-Ua": [
      "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\""
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Windows\""
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "cross-site"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    ]
  }
}

Second header used: {
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Connection": [
      "close"
    ],
    "Host": [
      "httpbin.io"
    ],
    "Referer": [
      "https://www.google.com/"
    ],
    "Sec-Ch-Ua": [
      "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\""
    ],
    "Sec-Ch-Ua-Platform": [
      "\"Linux\""
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "cross-site"
    ],
    "User-Agent": [
      "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    ]
  }
}

  
  

  
Copied!

You just edited the request headers to use a different user agent and platform for two requests. Congratulations!

Set the Order of the Headers

The arrangement of the entire header set doesn't significantly affect your scraper's performance. However, a strict arrangement rule applies to request header fields with comma-separated values.

Browsers like Chrome have a specific way of arranging comma-separated request header values. Using the same arrangement strengthens your scraper's ability to mimic a legitimate browser.

Examples of headers with comma-separated values include the encoding type, language type, and client hint user agent fields. Here's how Chrome arranges the values of each of these fields:

                    Example
                
{
    "Accept-Encoding": [
      "gzip, deflate, br, zstd"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Sec-Ch-Ua": [
      "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\""
    ]
}

  
  

  
Copied!

Changing the order of each field, as shown, can get you blocked:

                    Example
                
{
    "Accept-Encoding": [
      "br, deflate, gzip, zstd"
    ],
    "Accept-Language": [
      "en;q=0.9,en-US"
    ],
    "Sec-Ch-Ua": [
      "\"Not:A-Brand\";v=\"8\", \"Google Chrome\";v=\"123\", \"Chromium\";v=\"123\""
    ]
}

  
  

  
Copied!

A nifty way to mimic a typical browser arrangement is to inspect the request headers in the network tab.

To do that, open your target website via a browser like Chrome and go to the Network tab. Reload the web page and click a request name in the request table. Scroll to the Request Headers section and copy and paste the comma-separated fields into your headers dictionary:

Chrome Network Tab — Click to open the image in full screen

If followed, the comma-separated fields in your request headers should look like this:

                    scraper.py
                
# import the required libraries
from urllib import request, error

# define new request headers
missing_headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "Referer":"https://www.google.com/",
    "Sec-Ch-Ua-Platform": "\"Windows\"",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

# catch HTTP errors
try:

    # create a request param and add request headers
    request_params = request.Request(
        url="https://httpbin.io/headers", 
        headers=missing_headers
    )    

    # send the request with the parameters and obtain a response object
    response = request.urlopen(request_params).read()

    # print a decoded format to view your request headers
    print(response.decode("utf-8"))

except error.HTTPError as e:
    print(e)

  
  

  
Copied!

You now know how to set the right header order for your urllib scraper to mimic a real browser. That’s great!

Conclusion

In this tutorial, you've seen how to customize the request headers while scraping with urllib. Here's a recap of what you've learned:

Adding missing request headers to your urllib web scraper.
Intercepting the urllib request to edit existing HTTP request headers.
The strategy for setting the appropriate request headers order.

However, bypassing advanced anti-bot systems like DataDome, Cloudflare, etc., requires more than setting custom request headers. Plus, header management gets complicated at scale, increasing your chances of getting blocked. We recommend using ZenRows with your web scraper to manage your headers, auto-rotate proxies, and solve all anti-bot-related problems. Try ZenRows for free!