How to Use a Proxy with Pyppeteer in 2024

July 21, 2023 · 1 min read

Routing HTTP requests through different IP addresses is an essential method to avoid getting blocked while web scraping. For that reason, let's learn how to implement a Pyppeteer proxy in this tutorial!

Prerequisites

Ensure you have Python 3.6 or later installed on your local machine.

Then, install Pyppeteer from PyPI using pip by running the below command.

Terminal
pip install pyppeteer

Copied!

Frustrated that your web scrapers are blocked once and again?

ZenRows API handles rotating proxies and headless browsers for you.

Try for FREE

How to Use a Proxy with Pyppeteer

To get started, create the script scraper.py to make a request to ident.me in order to see your current IP.

scraper.py
import asyncio
from pyppeteer import launch
 
async def main():
    # Create a new headless browser instance
    browser = await launch()
    # Create a new page
    page = await browser.newPage()
    # Navigate to target website
    await page.goto('https://ident.me')
    # Select the body element
    body = await page.querySelector('body')
    # Get the text content of the selected element
    content = await page.evaluate('(element) => element.textContent', body)
    # Dump the result
    print(content)
    await browser.close()
 
asyncio.get_event_loop().run_until_complete(main())




Copied!

Run the script to get the content of the target page's body.

Terminal
python scraper.py

Copied!

Note

Note: The first time you launch Pyppeteer, it'll automatically download the latest version of Chromium.

Now, it's time to implement a Pyppeteer proxy in your script. For that, go grab a free proxy from FreeProxyList (the one we used might not work for you).

The used launch() method in the scraper.py script creates a new browser instance and allows you to specify some options. One of the options is args, which is a list of additional arguments to pass to the browser process, so set the --proxy-server argument to instruct the browser to route the Pyppeteer requests through a proxy.

scraper.py
# ...
async def main():
    # Create a new headless browser instance
    browser = await launch(args=['--proxy-server=http://20.219.108.109:8080'])
    # Create a new page
    page = await browser.newPage()
# ...




Copied!

Here's the full code:

scraper.py
import asyncio
from pyppeteer import launch
 
async def main():
    # Create a new headless browser instance
    browser = await launch(args=['--proxy-server=http://20.219.108.109:8080'])
    # Create a new page
    page = await browser.newPage()
    # Navigate to target website
    await page.goto('https://ident.me')
    # Select the body element
    body = await page.querySelector('body')
    # Get the text content of the selected element
    content = await page.evaluate('(element) => element.textContent', body)
    # Dump the result
    print(content)
    await browser.close()
 
asyncio.get_event_loop().run_until_complete(main())




Copied!

Run the script again with the command line python scraper.py, and you should get the IP of your proxy printed on the screen this time.

Output
20.219.108.109

Copied!

Well done, you just used a proxy with Pyppeteer!

Proxy Authentication with Pyppeteer

If you use a premium proxy, you'll need to authenticate with a username and password. For that, use the --proxy-auth argument:

scraper.py
# ...
    # Create a new headless browser instance
    browser = await launch(args=[
        '--proxy-server=http://20.219.108.109:8080'
        '--proxy-auth=USERNAME:PASSWORD'
        ])
# ...




Copied!

Alternatively, you can use the page API to authenticate, like below:

scraper.py
# ...
    # Create a new page
    page = await browser.newPage()
    await page.authenticate({ 'username': 'USERNAME', 'password': 'PASSWORD' })
# ...




Copied!

Note

Note: Remember to update USERNAME and PASSWORD with your credentials.

Set a Dynamic Proxy with Pyppeteer

Rather than a static proxy as used before, you'll need to use a dynamic proxy for web scraping to avoid getting banned. You can do that in Pyppeteer, creating multiple instances of the browser, each with its own proxy configuration.

Start by grabbing more free proxies and create a list of them:

scraper.py
# ...
import random
 
proxies = [
'http://20.219.108.109:8080',
'http://210.22.77.94:9002',
'http://103.150.18.218:80',
]
# ...




Copied!

Then, create an asynchronous function that takes a proxy as an argument and makes a Pyppeteer request to ident.me through it:

scraper.py
     
# ...
async def init_pyppeteer_proxy_request(url):
    # Create a new headless browser instance
    browser = await launch(args=[
        f'--proxy-server={url}',
        ])
    # Create a new page
    page = await browser.newPage()
    # Navigate to target website
    await page.goto('https://ident.me')
    # Select the body element
    body = await page.querySelector('body')
    # Get the text content of the selected element
    content = await page.evaluate('(element) => element.textContent', body)
    # Dump the result
    print(content)
    await browser.close()
# ...




Copied!

Now, update the main() function to call the created function with a random proxy selection:

scraper.py
# ...
async def main():
    for i in range(3):
        await init_pyppeteer_proxy_request(random.choice(proxies))
# ...




Copied!

Your code should look like this right now:

scraper.py

import asyncio
from pyppeteer import launch
import random
 
proxies = [
'http://20.219.108.109:8080',
'http://210.22.77.94:9002',
'http://103.150.18.218:80',
]
 
async def init_pyppeteer_proxy_request(url):
    # Create a new headless browser instance
    browser = await launch(args=[
        f'--proxy-server={url}',
        ])
    # Create a new page
    page = await browser.newPage()
    # Navigate to target website
    await page.goto('https://ident.me')
    # Select the body element
    body = await page.querySelector('body')
    # Get the text content of the selected element
    content = await page.evaluate('(element) => element.textContent', body)
    # Dump the result
    print(content)
    await browser.close()
 
async def main():
    for i in range(3):
        await init_pyppeteer_proxy_request(random.choice(proxies))
    
 
asyncio.get_event_loop().run_until_complete(main())




Copied!

Run the script, and you should get a random result for each request like the one below.

Output
20.219.108.109

103.150.18.218

103.150.18.218

Copied!

zenrows request builder — Click to open the image in full screen

Put the Python scraper code that the request builder generated into a new file and install Python Requests (or any other HTTP request library):

Terminal
pip install requests

Copied!

Now, run your scraper, and you'll get OpenSea's HTML page scraped and printed on the console.:

opensea-dump — Click to open the image in full screen

Conclusion

Using a proxy with Pyppeteer can significantly improve your web scraping success, and you've learned how to make requests with both static and dynamic proxies.

You also learned how an alternative tool can do the job faster and more reliably. If you need to scrape on a large scale without worrying about infrastructure and having more guarantees to get the data you need, ZenRows's web scraping API can be your ally.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.