How To Rotate Proxies in Python

Picture of Ander
By Ander Β· November 1, 2022 Β· 11 min read Β· Twitter
Ander is a web developer who has worked at startups for 12+ years. He began scraping social media even before influencers were a thing. Geek to the core.

A proxy can hide your real IP address, but what happens when that gets banned? You'd need a new IP. Or you could maintain a list of them and rotate proxies using Python for each request. The final option would be to use Smart Rotating Proxies, more on that later.

For now, we'll focus on building our custom proxy rotator in Python. We'll start from a list of regular proxies, check them to mark the working ones and provide simple monitoring to remove the failing ones from the working proxy list. The provided examples in the tutorial use Python, but the idea will work in any language you use for your scraping project.

Let's dive in!

How do I rotate my IP?

When building a crawler for URL and data extraction, the simplest way for a defensive system to prevent access is to ban IPs. If a high number of requests from the same IP hit the server in a short time, they'll mark that IP address.

To avoid that, the easiest way is to use different IP addresses. But you can't easily change that, and it's almost impossible in the case of servers. So in order to rotate IPs, you'd need to perform your requests via a proxy server. These will keep your original requests unmodified, but the target server will see their IP, not yours.

But not all proxies are the same.

What is the difference between static and rotating proxies?

Static proxies are those that use the same exit IP. It means that you can change your IP address via that proxy, but the final IP will always be the same.

To avoid having the same IP in each request, static proxies usually come in lists so you can use them all. Replacing them after some time or rotating them continuously, for example.

Rotating proxies, on the other hand, will automatically change the exit IP address. Some rotating proxies switch IPs after each request. Others change it after some time, for example, 5 minutes. This kind of proxy gives you the desired IP rotation by default, but the frequency may not be enough.

We'll now see how to rotate proxies in Python using static proxies. Let's go!

How do you rotate a proxy in Python?

For the code to work, you'll need python3 installed. Some systems have it pre-installed. After that, install all the necessary libraries by running pip install.

pip install aiohttp

1. Get a proxy List

You might not have a proxy provider with a list of domain+ports. Don't worry, we'll see how to get one.

There are several lists of free proxies online. For the demo, grab one of those and save its content (just the URLs) in a text file (rotating_proxies_list.txt). Or use the ones below. They don't require authentication.

Export Free Proxies
Export and copy proxies from the site. If you want to automatize the process, that site also offers a proxy API to download the list.

Free proxies aren't reliable, and the ones below probably won't work for you. They're usually short-lived. For production scraping projects, we recommend using datacenter or residential proxies.

167.71.230.124:8080 
192.155.107.211:1080 
77.238.79.111:8080 
167.71.5.83:3128 
195.189.123.213:3128 
8.210.83.33:80 
80.48.119.28:8080 
152.0.209.175:8080 
187.217.54.84:80 
169.57.1.85:8123

Then, we'll read that file and create an array with all the proxies. Read the file, strip empty spaces, and split each line. Be careful when saving the file since we won't perform any sanity checks for valid IP:port strings. We'll keep it simple.

proxies_list = open("rotating_proxies_list.txt", "r").read().strip().split("\n")
Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

2. Check the Proxies

Let's assume that we want to run the scrapers at scale. The demo is simplified, but the idea would be to store the proxies and their "health state" in a reliable medium like a database. We'll use in-memory data structures that disappear after each run, but you get the idea. That would be our proxy pool.

First, let's write a simple function to check that the proxy works. For that, call ident.me, a webpage that will return an HTML response with the IP. It is a simple page that fits our use case. We'll use the Python requests library, "a simple, yet elegant, HTTP library".

For the moment, it takes an item from the proxies list and calls the provided URL. Most of the code is boilerplate that will soon prove useful. There are two possible results:
  • 😊 If everything goes OK, it prints the response's content and status code (i.e., 200), which will probably be the proxy's IP.
  • πŸ˜” An error gets printed due to timeout or some other reason. It usually means that the proxy isn't available o cannot process the request. Many of these will appear when using free proxies.
import requests 
 
proxies_list = open("rotating_proxies_list.txt", "r").read().strip().split("\n") 
 
def get(url, proxy): 
	try: 
		# Send proxy requests to the final URL 
		response = requests.get(url, proxies={'http': f"http://{proxy}"}, timeout=30) 
		print(response.status_code, response.text) 
	except Exception as e: 
		print(e) 
 
def check_proxies(): 
	proxy = proxies_list.pop() 
	get("http://ident.me/", proxy) 
 
check_proxies()

We intentionally use HTTP instead of HTTPS because many free proxies don't support SSL.

Save this content in a file, for example, rotate-proxies-python.py. Then run it from your command line:

python rotate-proxies-python.py

And the output should look like this:

200 167.71.230.124 ### status code and the proxy's IP

3. Add More Checks To Validate the Results

An exception means that the request failed, but there are other options that we should check, such as status codes. We'll consider valid only specific codes and mark the rest as errors. The list isn't an exhaustive one, adjust it to your needs. You might think, for example, that 404 "Not Found" isn't valid and mark it to retry after some time..

We could also add other checks, like validating that the response contains an IP address.

VALID_STATUSES = [200, 301, 302, 307, 404] 
 
def get(url, proxy): 
	try: 
		response = requests.get(url, proxies={'http': f"http://{proxy}"}, timeout=30) 
		if response.status_code in VALID_STATUSES: # valid proxy 
			print(response.status_code, response.text) 
	except Exception as e: 
		print("Exception: ", type(e)) 
 
		VALID_STATUSES = [200, 301, 302, 307, 404]

4. Iterate Over All the Proxies

Great! We now need to run the checks for each proxy in the array. We'll loop over the list of proxies calling get just as before. We'll do it sequentially for simplicity, but we could use aiohttp and asyncio.gather to launch all the requests and wait for them to finish. Async makes the code more complicated, but it speeds up web scraping.

The list is hardcoded to get a maximum of 10 items for security, to avoid hundreds of involuntary requests.

session = requests.Session() 
# ... 
		response = session.get(url, proxies={'http': f"http://{proxy}"}, timeout=30) 
# ... 
def check_proxies(): 
	proxies = proxies_list[0:10] # limited to 10 to avoid too many requests 
	for proxy in proxies: 
		get("http://ident.me/", proxy) 

5. Separate Working Proxies From the Failed Ones

Examining an output log is far from ideal, isn't it? We should keep an internal state for the proxy pool. We'll separate them into three groups:
  • unchecked: unknown state, to be checked.
  • working: the last call using this proxy server was successful.
  • not working: the last request failed.

It is easier to add or remove items from sets than arrays, and they come with the advantage of avoiding duplicates. We can move proxies between lists without worrying about having the same one twice. If it's present, it just won't be added. That will simplify our code: remove an item from a set and add it to another. To achieve that, we need to modify the proxy storage slightly.

Three sets will exist, one for each group seen above. The initial one, unchecked, will contain the proxies from the file. A set can be initialized from an array, making it easy for us to create it.

proxies_list = open("rotating_proxies_list.txt", "r").read().strip().split("\n") 
unchecked = set(proxies_list[0:10]) # limited to 10 to avoid too many requests 
# unchecked = set(proxies_list) 
working = set() 
not_working = set() 
 
# ... 
def check_proxies(): 
	for proxy in list(unchecked): 
		get("http://ident.me/", proxy) 
#...

Now, write helper functions to move proxies between states. One helper for each state. They'll add the proxy to a set and remove it - if present - from the other two. Here is where sets come in handy since we don't need to worry about checking if the proxy is present or looping over the arrays. Call "discard" to remove if present or ignored, but no exception will raise.

For example, we'll call set_working when a request is successful. And that function will remove the proxy from the unchecked or not working sets while adding it to the working set.

def reset_proxy(proxy): 
	unchecked.add(proxy) 
	working.discard(proxy) 
	not_working.discard(proxy) 
 
def set_working(proxy): 
	unchecked.discard(proxy) 
	working.add(proxy) 
	not_working.discard(proxy) 
 
def set_not_working(proxy): 
	unchecked.discard(proxy) 
	working.discard(proxy) 
	not_working.add(proxy)

We're missing the crucial part! We need to edit get to call these functions after each request. set_working for the successful ones and set_not_working for the rest.

def get(url, proxy): 
	try: 
		response = session.get(url, proxies={'http': f"http://{proxy}"}, timeout=30) 
		if response.status_code in VALID_STATUSES: 
			set_working(proxy) 
		else: 
			set_not_working(proxy) 
	except Exception as e: 
		set_not_working(proxy)

For the moment, add some traces at the end of the script to see if it's working well. The unchecked set should be empty since we run all the items. And those items will populate the other two sets. Hopefully, working isn't empty πŸ˜… - it might happen when we use proxies from a public, free list.

#... 
check_proxies() 
 
print("unchecked ->", unchecked) # unchecked -> set() 
print("working ->", working) # working -> {"152.0.209.175:8080", ...} 
print("not_working ->", not_working) # not_working -> {"167.71.5.83:3128", ...}

6. Use of Working Proxies

That was a straightforward way to check proxies but not truly useful yet. We now need a way to get the working proxies and use them for the real reason: web scraping actual content. We'll create a function that will select a random proxy.

We included both working and unchecked proxies in our example, feel free to use only the working ones if it fits your needs. We'll see later why the unchecked ones are present too.

random doesn't work with sets, so we'll convert them using a tuple.

And finally, here's a simple version of a proxy rotator in Python using random picking from a working proxy pool:

import random 
 
def get_random_proxy(): 
	# create a tuple from unchecked and working sets 
	available_proxies = tuple(unchecked.union(working)) 
	if not available_proxies: 
		raise Exception("no proxies available") 
	return random.choice(available_proxies)

Next, we can edit the get function to use a random proxy if none is present. The proxy parameter is now optional. We'll use that param to check the initial proxies, as we were doing before. But after that, we can forget about the proxy list and call get without it. A random one will be used and added to the not_working set in case of failure.

Since we'll now want to get actual content, we need to return the response or raise the exception. Find here the final version:

def get(url, proxy = None): 
	if not proxy: 
		proxy = get_random_proxy() 
 
	try: 
		response = session.get(url, proxies={'http': f"http://{proxy}"}, timeout=30) 
		if response.status_code in VALID_STATUSES: 
			set_working(proxy) 
		else: 
			set_not_working(proxy) 
 
		return response 
	except Exception as e: 
		set_not_working(proxy) 
		raise e # raise exception

Include below the script the content you want to scrape. We'll merely call once again the same test URL for the demo. You could create a custom parser with Beautifulsoup for your target webpage.

The idea is to start from here to build a real-world scraper based on this backbone. And to scale it, store the items in persistent storage, such as a database (i.e., Redis).

Here you can see in action the final script on how to rotate proxies in Python. Save and run the file. The output should be similar to the one in the comments.

# rest of the rotating proxy script 
 
check_proxies() 
 
# real scraping part comes here 
def main(): 
	result = get("http://ident.me/") 
	print(result.status_code) # 200 
	print(result.text) # 152.0.209.175 
 
main()

Take also into account that the IP is only one of the factors that defensive systems check. The one that comes hand in hand is HTTP headers, especially user agents. We wrote a list with these and other tips on Scraping Without Getting Blocked.

Rotating Proxies using Selenium

If you were to scrape dynamic content, you'd need to rotate proxies in Selenium or Puppeteer, for instance. We already wrote a guide on web scraping with Selenium using Python. There you can see how to start the browser with the proxy settings. Join both contents together and you'll have a proxy rotator on Selenium.

from selenium import webdriver 
 
# ... 
proxy = "152.0.209.175:8080" # free proxy 
options.add_argument("--proxy-server=%s" % proxy) 
with webdriver.Chrome(options=options) as driver: 
	# ...

What happens with false negatives or one-time errors? Once we send a proxy to the not_working set, it'll remain there forever. There's no way back.

7. Re-Checking Not Working Proxies

We should re-check the failed proxies from time to time. There are many reasons: the failure was due to networking issues, a bug, or the proxy provider fixed it.

In any case, Python allows us to set Timers, "an action that should be run only after a certain amount of time has passed". There are different ways to achieve the same end, and this is simple enough to run it using three lines.

Remember the reset_proxy function? We didn't use it at all until now. We'll set a Timer to run that function for every proxy marked as not working. Twenty seconds is a small number for a real-world case but enough for our test. We exclude a failing proxy and move it back to unchecked after some time.

And this is the reason to use both working and unchecked sets in get_random_proxy. Modify that function to use only working proxies for a more robust use case. And then, you can run check_proxies periodically, which will loop over the unchecked elements - in this case, failed proxies that remained some time in the sin bin.

from threading import Timer 
 
def set_not_working(proxy): 
	unchecked.discard(proxy) 
	working.discard(proxy) 
	not_working.add(proxy) 
 
	# move to unchecked after a certain time (20s in the example) 
	Timer(20.0, reset_proxy, [proxy]).start()

There is a final option for even more robust systems, but we'll leave the implementation up to you. Store analytics and usage for each proxy, for example, the number of times it failed and when was the last one. Using that info, adjust the time to re-check - longer times for proxies that failed several times. Or even set some alerts if the number of working proxies goes below a threshold.

For the final version, check out our GitHub gist. The final URL to test is hardcoded in line 61; change that for any address, endpoint or API you want to test. And since all the calls occur via requests library, it'll handle the content type, be it HTML, JSON, or any other a browser can receive.

Conclusion

Building a Python proxy rotator might seem doable for small scraping scripts, but it can grow painful. But, hey, you did it!! πŸ‘

As a note of caution, don't rotate IP addresses when scraping logged-in or any other kind of session/cookies.

How do I get a rotating proxy?

  1. Store the proxy list as plain text.
  2. Import from the file as an array.
  3. Check each of them.
  4. Separate the working ones.
  5. Check for failures while scraping and remove them from the working list.
  6. Re-check not working proxies from time to time.

If you don't want to worry about rotating proxies manually, you can always use ZenRows, a Web Scraping API that includes Smart Rotating Proxies. It works as a regular proxy - with a single URL that includes your API key - but provides different IPs for each request. And also offer the option for Residential or Premium Proxies, which have a higher success rate and work faster.

Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Want to keep learning?

We will be sharing all the insights we have learned through the years in the following blog posts. If you don't want to miss a piece and keep learning, we'd be thrilled to have us in our newsletter.

No spam guaranteed. You can unsubscribe at any time.