How to Set Up a Proxy in AIOHTTP

July 11, 2024 · 7 min read

AIOHTTP is an asynchronous HTTP client/server framework built on top of Python's asyncio library. While efficient for asynchronous web scraping in Python, it can still get blocked by websites with anti-bot measures.

In this tutorial, you'll learn how to set up proxies to avoid that. We’ll go through a simple proxy setup and then learn how to build a proxy rotator and use premium proxies for maximum protection against blocks and bans. Let's go!

Set up a Single Proxy in AIOHTTP

To get started, install the AIOHTTP Python library:

Terminal
pip install aiohttp

Next, import the aiohttp and asyncio modules to your script.

scraper.py
import aiohttp
import asyncio

Now, define an asynchronous function enclosing the main logic to perform HTTP GET requests. Create a ClientSession object to connect with the web server. Then, make a GET request to HTTPBin, a website that returns the client's IP address. Finally, print the response to the console.

scraper.py
# ...

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # perform an HTTP GET request
        async with session.get("http://httpbin.org/ip") as resp:
            print(await resp.text())

Execute your asynchronous function using asyncio.run():

scraper.py
# ...

# run the main async function
asyncio.run(main())

Your complete script should look like this:

scraper.py
import aiohttp
import asyncio

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # perform an HTTP GET request
        async with session.get("http://httpbin.org/ip") as resp:
            print(await resp.text())

# run the main async function
asyncio.run(main())

Running the above code will print your machine's IP address:

Output
{
  "origin": "210.212.39.138"
}

However, if you make too many requests from the same IP to one website's server, your activity may be flagged as suspicious, resulting in blocks or even permanent bans.

Let's learn how to integrate proxies into the code to avoid that.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

First, grab a free proxy from the Free Proxy List website. Next, define a proxy variable that stores the proxy server address. Finally, pass this variable to the session.get() method.

Here's the updated code implementing a single proxy in AIOHTTP:

scraper.py
import aiohttp
import asyncio

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # define a proxy server address
        proxy = "http://8.219.97.248:80"

        # perform an HTTP GET request
        async with session.get("http://httpbin.org/ip", proxy=proxy) as resp:
            print(await resp.text())

# run the main async function
asyncio.run(main())

You'll get the following response:

Output
{
  "origin": "8.219.64.236"
}

Congratulations! You successfully masked your real IP address using a proxy.

Proxy Authentication

Some proxy servers require authentication to ensure only users with valid credentials can access them. It's typically the case with commercial solutions or premium proxies.

Define the proxy authentication credentials using the aiohttp.BasicAuth() method. Then, pass it as a parameter to the session.get() method. Here's the code implementing the same:

scraper.py
import aiohttp
import asyncio

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # define a proxy server address
        proxy = "http://8.219.97.248:80"

        # define proxy authentication credentials
        proxy_auth = aiohttp.BasicAuth("<YOUR_USERNAME>", "<YOUR_PASSWORD>")

        # perform an HTTP GET request
        async with session.get("http://httpbin.org/ip", proxy=proxy, proxy_auth=proxy_auth) as resp:
            print(await resp.text())

# run the main async function
asyncio.run(main())

You'll get the following IP address as output:

Output
{
  "origin": "8.219.64.236"
}

AIOHTTP also allows you to specify the authentication credentials (username and password) in the proxy URL:

scraper.py
import aiohttp
import asyncio

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # define authentication credentials in proxy URL
        proxy = "http://<YOUR_USERNAME>:<YOUR_PASSWORD>@8.219.97.248:80"

        # perform an HTTP GET request
        async with session.get("http://httpbin.org/ip", proxy=proxy) as resp:
            print(await resp.text())

# run the main async function
asyncio.run(main())

You'll get the same output as before:

Output
{
  "origin": "8.219.64.236"
}

Best Proxy Protocol: HTTP, HTTPS, SOCKS

HTTP, HTTPS, and SOCKS are the most common proxy protocols. Each has its own strengths and use cases.

Both HTTP and HTTPS proxy protocols are useful for web scraping. HTTP proxies are suitable for accessing plain HTTP websites, while HTTPS proxies provide encryption, which ensures secure communication.

However, HTTPS proxies can handle both HTTP and HTTPS requests, which makes them a better choice for web scraping.

SOCKS is a versatile proxy protocol suitable for handling non-HTTP network traffic like TCP and UDP.

If you want to use SOCKS or SOCKS5 proxies with AIOHTTP, you first need to install the aiohtpp-socks package since the plain AIOHTTP package doesn’t have native support for SOCKS proxies.

Use Rotating Proxies With AIOHTTP

As mentioned above, when your script sends multiple requests from the same IP in a short time, websites may flag this as suspicious activity and block your access.

If you rotate IPs, your scraper will be much more effective and hard to detect.

Let's implement this functionality!

First, get a few free proxies from the Free Proxy List website.

Example
proxy_list = [
    "http://8.219.97.248:80",
    "http://3.77.153.38:80",
    "http://3.70.179.165:80",
    "http://20.235.159.154:80"
]

Next, define a function randomly selecting a proxy from proxy_list and returning it. You can use Python's random.choice() method for this.

Here's the modified code with rotating proxies functionality:

scraper.py
import aiohttp
import asyncio
import random

# function to randomly select and return a proxy
def rotate_proxy():
    proxy_list = [
        "http://8.219.97.248:80",
        "http://3.77.153.38:80",
        "http://3.70.179.165:80",
        "http://20.235.159.154:80"
    ]
    return random.choice(proxy_list)

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # choose a random proxy
        proxy = rotate_proxy()

        # perform an HTTP GET request
        async with session.get("http://httpbin.org/ip", proxy=proxy) as resp:
            print(await resp.text())

# run the main async function
asyncio.run(main())

Each time you run this code, you'll get a randomly selected proxy as output.

Here's the result for three runs:

Terminal
# request 1
{
  "origin": "3.70.179.165"
}

# request 2
{
  "origin": "8.219.64.236"
}

# request 3
{
  "origin": "20.235.159.154"
}

The above output confirms the code is successfully rotating the proxies. Good job!

However, even rotating proxies may not be enough against strong anti-bot systems. Let's see what happens if we try to access a protected website like G2 Reviews.

Replace the old HTTPBin URL with G2 Reviews. You'll likely get errors, so handle them using try...except blocks.

scraper.py
import aiohttp
import asyncio
import random

# function to randomly select and return a proxy
def rotate_proxy():
    proxy_list = [
        "http://8.219.97.248:80",
        "http://3.77.153.38:80",
        "http://3.70.179.165:80",
        "http://20.235.159.154:80",
        "http://35.185.196.38:3128",
    ]
    return random.choice(proxy_list)

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # choose a random proxy
        proxy = rotate_proxy()

        try:
            # perform an HTTP GET request
            async with session.get("https://www.g2.com/products/asana/reviews", proxy=proxy) as resp:
                print(resp.status)
        except aiohttp.ClientError as e:
            # print error
            print(f"An error occurred: {e}")


# If encountering the "Event loop is closed" RuntimeError on Windows machines,
# uncomment the following line of code to resolve the issue:
# asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

# run the main async function
asyncio.run(main())

You'll get the 403 status code as output:

Output
403

Error 403 means the access was denied by the target server.

Use Premium Proxies

As you saw, free proxies are unreliable. You should opt for premium proxies as they offer a more automated and reliable process to bypass anti-bot systems. If you're unsure where to get them, check out our list of best web scraping proxy services.

Let's use ZenRows, the most effective and reliable premium proxy provider, to understand how premium proxies work. Use the same protected page as the example that got you blocked in the previous section.

Sign up for free, and you'll get redirected to the Request Builder page.

Paste the G2 Reviews URL in the URL to Scrape box. Enable JS Rendering and click on the Premium Proxies check box. Select Python as your language and click on the Proxy tab. Finally, copy the generated premium proxy.

ZenRows Request Builder
Click to open the image in full screen

Make changes to your single proxy code to integrate ZenRows. Here's the code to access the protected G2 Reviews page using ZenRows premium proxies and AIOHTTP:

scraper.py
import aiohttp
import asyncio

# async function to perform HTTP GET request
async def main():
    async with aiohttp.ClientSession() as session:
        # define ZenRows premium proxy
        proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001"
        
        # perform an HTTP GET request
        async with session.get("https://www.g2.com/products/asana/reviews", proxy=proxy, ssl=False) as resp:
            print(resp.status)


# If encountering the "Event loop is closed" RuntimeError on Windows machines,
# uncomment the following line of code to resolve the issue:
# asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

# run the main async function
asyncio.run(main())

When you run this script, you'll get 200 status code as output.

Output
200

Amazing! You've just integrated ZenRows premium proxies with AIOHTTP to bypass anti-bot protection.

Conclusion

This tutorial showed how to set up a proxy in AIOHTTP with Python. You started with a single proxy configuration and then moved on to more robust methods, including rotating and premium proxies.

Avoid the hassle of finding and configuring proxies. Use ZenRows, a reliable solution that bypasses any anti-bot protection. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you