Do you want to take screenshots while testing or scraping a website with Pyppeteer?
In this article, you'll learn three methods of leveraging the powerful screenshoting abilities of Pyppeteer, the Puppeteer port for Python.
- Option 1: Generate a screenshot of the visible part of the screen.
- Option 2: Capture a full-page screenshot.
- Option 3: Create a screenshot of a specific element.
How to Take a Screenshot With Pyppeteer
Pyppeteer inherits all of Puppeteer's capabilities, so it's an excellent tool for screenshotting different web page parts while scraping with Python.
Using scrapingcourse.com as a demo website, you'll learn how to use Pyppeteer to capture the visible part of a web page, generate a full-page screenshot, and get a screenshot of a specific element.
Option 1: Generate a Screenshot for the Visible Part of the Screen
Screenshotting the screen's visible part captures the web page portion that the user can see (the viewport).
Here's the expected viewport screenshot of the target web page:
Let's grab that screenshot!
Import the required library and define an asynchronous scraper function. Launch a browser instance and create a new page inside that function:
# import the required libraries
import asyncio
from pyppeteer import launch
# start an asynchronous function
async def scraper():
# launch a browser instance and create a new page
browser = await launch()
page = await browser.newPage()
Open the target web page and call the screenshot method. Then, close the browser and execute the asynchronous function with asyncio
:
# start an asynchronous function
async def scraper():
# ...
# open the target website
await page.goto("https://scrapingcourse.com/ecommerce/product/abominable-hoodie/")
# grab a viewport screenshot of the web page
await page.screenshot(path="above-the-fold-screenshot.png")
# close the browser instance
await browser.close()
# run the asynchronous function
asyncio.run(scraper())
Combine both snippets, and you'll get the following code:
# import the required libraries
import asyncio
from pyppeteer import launch
# start an asynchronous function
async def scraper():
# launch a browser instance and create a new page
browser = await launch()
page = await browser.newPage()
# open the target website
await page.goto("https://scrapingcourse.com/ecommerce/product/abominable-hoodie/")
# grab a viewport screenshot of the web page
await page.screenshot(path="above-the-fold-screenshot.png")
# close the browser instance
await browser.close()
# run the asynchronous function
asyncio.run(scraper())
Good job! Now, let's tweak the code to grab a full-page screenshot.
Option 2: Capture a Full-Page Screenshot
Pyppeteer lets you take full-page screenshots, which means that you can capture the entire web page, including the parts hidden behind the viewport.
See the full-page screenshot demo of the target product page below:
To grab a screenshot like this, add a fullPage
argument to the previous viewport screenshot method:
# import the required libraries
import asyncio
from pyppeteer import launch
# start an asynchronous function
async def scraper():
# launch a browser instance and create a new page
browser = await launch()
page = await browser.newPage()
# open the target website
await page.goto("https://scrapingcourse.com/ecommerce/product/abominable-hoodie/")
# grab a full-page screenshot of the web page
await page.screenshot(path="full-page-screenshot.png", fullPage=True)
# close the browser instance
await browser.close()
# run the asynchronous function
asyncio.run(scraper())
You now know how to capture a full-page screenshot in Puppeteer!
In the next section, you'll see how to capture a screenshot of a specific element.
Option 3: Create a Screenshot of a Specific Element
To grab a screenshot of a specific element, you need to select that element from the target web page and screenshot it directly.
Let's screenshot the product summary section of the target web page to see how it works. The expected screenshot of the target element looks like this:
Let's go see how to achieve that. First, define an asynchronous scraper function. Launch the browser and open the target product page:
# import the required libraries
import asyncio
from pyppeteer import launch
# start an asynchronous function
async def scraper():
# launch a browser instance and create a new page
browser = await launch()
page = await browser.newPage()
# open the target website
await page.goto("https://scrapingcourse.com/ecommerce/product/abominable-hoodie/")
Obtain the target element using its attribute (summary.entry-summary
) and call the screenshot method from that element. Finally, close the browser and run the scraper function:
# start an asynchronous function
async def scraper():
# ...
# obtain the target element using querySelector
element = await page.querySelector(".summary.entry-summary")
# grab a screenshot of the target element
await element.screenshot(path="specific-element-screenshot.png")
# close the browser instance
await browser.close()
# run the asynchronous function
asyncio.run(scraper())
You'll get the following complete code after combining both snippets:
# import the required libraries
import asyncio
from pyppeteer import launch
# start an asynchronous function
async def scraper():
# launch a browser instance and create a new page
browser = await launch()
page = await browser.newPage()
# open the target website
await page.goto("https://scrapingcourse.com/ecommerce/product/abominable-hoodie/")
# obtain the target element using querySelector
element = await page.querySelector(".summary.entry-summary")
# grab a screenshot of the target element
await element.screenshot(path="specific-element-screenshot.png")
# close the browser instance
await browser.close()
# run the asynchronous function
asyncio.run(scraper())
Your Pyppeteer scraper now screenshots a specific element. Awesome!
However, the methods above wonโt save you from anti-bot blocks that youโre bound to encounter while scraping. The next section will tell you how to avoid them.
Avoid Getting Blocked While Taking Screenshots With Pyppeteer
Website protection systems can block you from taking screenshots, especially when you scrape at scale. You need a way to bypass them to freely scrape without getting blocked.
Let's take the screenshot code above and try to access a protected page like G2 Reviews.
# import the required libraries
import asyncio
from pyppeteer import launch
# start an asynchronous function
async def scraper():
# launch a browser instance and create a new page
browser = await launch()
page = await browser.newPage()
# open the target website
await page.goto("https://www.g2.com/products/azure-sql-database/reviews")
# grab a full-page screenshot of the web page
await page.screenshot(path="full-page-screenshot.png", fullPage=True)
# close the browser instance
await browser.close()
# run the asynchronous function
asyncio.run(scraper())
The code gets blocked by Cloudflare Turnstile:
The best way to avoid this block and screenshot the page you want is using a web scraping API like ZenRows. It handles the request headers, auto-rotates premium proxies, and bypasses CAPTCHAs and other anti-bot systems.
ZenRows also supports full-page screenshots and features JavaScript instructions for extracting dynamic content, allowing you to replace Pyppeteer with ZenRows.
To try it out, sign up to open the Request Builder. Paste the target URL in the link box, toggle on JS Rendering, and activate Premium Proxies. Select Python as your chosen language and click the API request mode. Then, copy and paste the generated code into your script:
The generated code uses Python's Requests library as the HTTP client. Install it using pip
:
pip install requests
Modify the generated code to get the returned data as a stream and save it to your project directory. Pay attention to the extra `return_screenshot` parameter in the request parameters:
# import the required library
import requests
# specify the query parameters with a screenshot option
params = {
"url": "https://www.g2.com/products/azure-sql-database/reviews",
"apikey": "<YOUR_ZENROWS_API_KEY>",
"js_render": "true",
"premium_proxy": "true",
"return_screenshot": "true"
}
# send the rquest and set the response type to stream
response = requests.get("https://api.zenrows.com/v1/", params=params, stream=True)
# check if the request was successful
if response.status_code == 200:
# save the response content as a screenshot
with open("g2-page-screenshot.png", "wb") as f:
f.write(response.content)
print("Screenshot saved successfully.")
else:
print(f"Error: {response.status_code}")
The code captures a full-page screenshot of the protected page.
Congratulations! You've just taken a screenshot of a Cloudflare-protected web page with ZenRows.
Conclusion
In this tutorial, you've learned the three methods of taking a screenshot while scraping with Pyppeteer in Python. Here's a recap of what you now know:
- Taking a screenshot of the visible part of the web page.
- Getting a full-page screenshot, including the parts hidden behind the scrollbar.
- Screenshotting a specific web element.
Remember that many websites implement various anti-bot mechanisms to block you from taking screenshots during web scraping. Integrate ZenRows, an all-in-one web scraping tool, into your web scraper, and take screenshots of any protected website at scale without getting blocked. Try ZenRows for free!