How to Set Up a Proxy With MechanicalSoup

July 3, 2024 · 8 min read

MechanicalSoup is a Python library for automating website interactions. It's built on top of BeautifulSoup and Requests, popular tools in the web scraping community.

While MechanicalSoup is effective for web scraping using Python, it doesn't prevent your scraper from getting blocked by the websites' anti-bot systems.

Fortunately, there are a few methods to boost MechanicalSoup anti-detection powers. In this tutorial, you'll learn a step-by-step process of how to set up proxies in MechanicalSoup.

Set Up a Single Proxy With MechanicalSoup

As a prerequisite, install the MechanicalSoup library if you haven't already:

Terminal
pip install MechanicalSoup

Before setting up the proxy, let's make a simple HTTP GET request to https://httpbin.io/ip. This website returns the IP address of the client making the request.

Import the mechanicalsoup module to your code and create a browser object. Next, send a GET request to the target URL and print the response text.

scraper.py
import mechanicalsoup

# create a browser object
browser = mechanicalsoup.StatefulBrowser()

# send a GET request
response = browser.session.request("get", "https://httpbin.io/ip")

print(response.text)

The code will print your machine's IP address:

Output
{
  "origin": "50.173.55.144:30127"
}

Exposing your IP address is not a good idea, as websites may block it due to scraping activities.

Let's set up a proxy to reduce the chances of being detected and blocked.

Start with grabbing a free proxy from the Free Proxy List website.

Next, define a proxy in your code pointing to the IP address and port number (e.g., http://8.219.97.248:80). This ensures that HTTP and HTTPS requests are routed through this proxy.

scraper.py
# define proxies using this syntax:
# <PROXY_PROTOCOL>://<PROXY_IP_ADDRESS>:<PROXY_PORT>
proxies = {
    "https": "http://8.219.97.248:80",
    "http": "http://8.219.97.248:80",
}

Finally, pass the proxies dictionary to the browser object you defined before. Here's what your code should look like:

scraper.py
import mechanicalsoup

# define proxies using this syntax:
# <PROXY_PROTOCOL>://<PROXY_IP_ADDRESS>:<PROXY_PORT>
proxies = {
    "https": "http://8.219.97.248:80",
    "http": "http://8.219.97.248:80",
}

# create a browser object
browser = mechanicalsoup.StatefulBrowser()

# send a GET request
response = browser.session.request("get", "https://httpbin.io/ip", proxies=proxies)

print(response.text)

The code will output the IP address of the used proxy server:

Output
{
  "origin": "8.219.64.236:60924"
}

Congrats! You've just changed the IP address of your MechanicalSoup scraper. Let's move to more advanced concepts.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Proxy Authentication

Some proxy servers require authentication to grant access only to users with valid credentials. It's usually the case with commercial solutions or premium proxies.

Here's the syntax to specify credentials (username and password) for an authenticated proxy:

Example
<PROXY_PROTOCOL>://<USERNAME>:<PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

This is what your updated code with proxy authentication should look like:

scraper.py
import mechanicalsoup

# define proxies
proxies = {
    "https": "http://<YOUR_USERNAME>:<YOUR_PASSWORD>@72.10.160.173:3985",
    "http": "http://<YOUR_USERNAME>:<YOUR_PASSWORD>@72.10.160.173:3985",
}

# create a browser object
browser = mechanicalsoup.StatefulBrowser()

# send a GET request
response = browser.session.request("get", "https://httpbin.io/ip", proxies=proxies)

print(response.text)

Add Rotating and Premium Proxies to MechanicalSoup

If you make multiple requests in a short period using a single proxy, the websites you're trying to access can detect this behavior and block you.

To avoid getting blocked, you can use a rotating proxy. This means changing proxies after a certain amount of time or number of requests, making you appear as a different user each time.

Create a list of proxies using the same Free Proxy List website:

Example
# create a list of proxies
PROXIES = [
    "http://8.219.97.248:80",
    "http://148.72.140.24:30127",
    # ...
    "http://77.238.235.219:8080"
]

Next, create a function that randomly selects the proxies from the list and returns a dictionary object. You can use the random.choice() method for this.

scraper.py
# ...
import random

# ...

# function to randomly select and return proxies
def rotate_proxy():
    https_proxy = random.choice(PROXIES)
    http_proxy = random.choice(PROXIES)

    return {
        "https": https_proxy,
        "http": http_proxy
        }

# ...

# rotate proxies
proxies = rotate_proxy()

# ...

Here's your final rotating proxy code:

scraper.py
import mechanicalsoup
import random

# create a list of proxies
PROXIES = [
    "http://8.219.97.248:80",
    "http://148.72.140.24:30127",
    # ...
    "http://77.238.235.219:8080"
]

# function to randomly select and return proxies
def rotate_proxy():
    https_proxy = random.choice(PROXIES)
    http_proxy = random.choice(PROXIES)

    return {
        "https": https_proxy,
        "http": http_proxy
        }

# create a browser object
browser = mechanicalsoup.StatefulBrowser()

# rotate proxies
proxies = rotate_proxy()

# send a GET request
response = browser.session.request("get", "https://httpbin.io/ip", proxies=proxies)

print(response.text)

Each time you run this code, the script randomly picks a proxy from the list.

Output
# request 1
{
  "origin": "8.219.64.236:64632"
}

# request 2
{
  "origin": "77.238.235.219:8080"
}

# request 3
{
  "origin": "148.72.140.24:30127"
}

Congratulations! You've successfully implemented the rotating proxies functionality.

Premium Proxy to Avoid Getting Blocked

Free proxies create significant challenges for web scraping. Their poor performance, security concerns, and frequent blocking patterns make them unreliable for professional scraping tasks. Most websites detect and block these free proxies, disrupting your data collection efforts.

Premium proxies provide a more reliable solution for avoiding detection. With high-quality IPs and advanced rotation capabilities, premium proxies can effectively handle scraping at any scale. Features like smart routing and geo-location targeting substantially improve your scraping success rate.

ZenRows' Residential Proxies emerges as a premium solution, offering access to 55M+ residential IPs across 185+ countries. With features like dynamic IP rotation, intelligent proxy selection, and flexible geo-targeting, all backed by 99.9% uptime, it's perfect for reliable web scraping with MechanicalSoup.

Let's integrate ZenRows' Residential Proxies with MechanicalSoup.

Sign up and visit the Proxy Generator dashboard. Your proxy credentials will be generated automatically.

generate residential proxies with zenrows
Click to open the image in full screen

Copy your proxy credentials and use them in this Python code:

scraper.py
import mechanicalsoup

# define proxies
proxies = {
    "http": "http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1337",
    "https": "https://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1338",
}

# create a browser object
browser = mechanicalsoup.StatefulBrowser()

# send a GET request
response = browser.session.request("get", "https://httpbin.io/ip", proxies=proxies)

print(response.text)

Here's the output after running this script two times:

Output
// request 1
{
  "origin": "185.123.101.84:51432"
}
// request 2
{
  "origin": "79.110.52.96:36721"
}

Congratulations! The different IP addresses confirm that your MechanicalSoup requests are successfully routed through ZenRows' residential proxy network. Your code is now equipped with premium proxies that significantly reduce the risk of detection during web scraping.

Conclusion

This step-by-step tutorial showed how to set up a proxy in MechanicalSoup.

Now you know:

  • The basics of setting a proxy with MechanicalSoup in Python.
  • How to deal with proxy authentication.
  • How to use a rotating proxy.
  • How to implement a premium proxy and bypass anti-bot systems.

Using ZenRows, you can bypass any anti-bot protection and increase the reliability of your scraper. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you