Puppeteer in PHP for Web Scraping: Step-by-Step Tutorial

June 18, 2024 · 12 min read

Table of contents

Why use Puppeteer with PHP?
Tutorial
- Download PuPHPeteer
- Get HTML
- Step 3: Extract data
- Step 4: Export data
How to interact with browsers
- Scrolling
- Wait for element
- Wait for page
- Click elements
- Take a screenshot
Avoid getting blocked
Conclusion

Puppeteer is a powerful tool for testing and web scraping via browser automation. While it's a JavaScript library, its popularity prompted the developer community to create unofficial ports for other languages. One of them is the Puppeteer PHP library, which allows PHP users to reap the benefits of Puppeteer without switching to JavaScript.

In this guide, you'll explore the basics of Puppeteer for PHP and then move on to more complex browser interactions. You'll learn:

How to use Puppeteer with PHP.
Interact with web pages in a browser like a human.
Avoid getting blocked.

Let's roll!

Why Use Puppeteer With PHP?

Puppeteer is a developer-favorite JavaScript headless browser library thanks to its intuitive and rich API. It simplifies cross-platform browser automation, supporting both testing and web scraping activities.

There's an unofficial PHP port for Puppeteer called PuPHPeteer. As of Jan 1, 2023, the original project is no longer under maintenance. However, there are a few up-to-date forks, zoonru/puphpeteer being the most popular one.

Note

If you need to brush up on the basics before diving deeper, read the guides on headless browser scraping and web scraping with PHP.

How to Scrape With Puppeteer in PHP?

Let’s go through the first steps with Puppeteer for PHP. You'll build an automated script that extracts data from this infinite scrolling demo:

Infinite scrolling page demo — Click to open the image in full screen

This webpage dynamically loads new products via AJAX as you scroll down. Interacting with it requires a browser automation tool that can execute JavaScript, such as Puppeteer.

Follow the steps below to build a data scraper that targets this dynamic content page!

Avoid getting blocked with headless browsers

ZenRows unlocks all the data you need by mimicking human behavior, loading dynamic content, and interacting with any webpage.

Try for Free

Step 1: Download PuPHPeteer

Before getting started, make sure you have PHP 8+, Composer, and Node.js installed on your machine. Follow the three links for instructions on setting them up.

You now have everything you need to initialize a PHP Composer project. Create a php-puppeteer-project folder and enter it in the terminal:

                    Terminal
                
mkdir php-puppeteer-project
cd php-puppeteer-project

Copied!

Run the init command to create a new Composer project inside the folder. Follow the wizard and answer the questions as required. The default answers will do the following:

                    Terminal
                
composer init

Copied!

php-puppeteer-project now contains the Composer project.

Use the command below to add zoon/puphpeteer to your project's dependencies:

                    Terminal
                
composer require zoon/puphpeteer

Copied!

This command may fail because of the following error:

                    Output
                
- clue/socket-raw[v1.2.0, ..., v1.6.0] require ext-sockets * -> it is missing from your system. Install or enable PHP's sockets extension.

Copied!

In this case, you need to install and enable the ext-sockets PHP extension. Then, relaunch the above composer require command.

Next, install the github:zoonru/puphpeteer npm package:

                    Terminal
                
npm install github:zoonru/puphpeteer

Copied!

This will take a while, so be patient.

Load the project folder in your favorite PHP IDE, such as Visual Studio Code with the PHP extension. Create a scraper.php file in the /src folder and import the Puppeteer PHP package:

                    scraper.php
                
<?php

require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;

// scraping logic...

Copied!

You can run the above PHP Puppeteer script with this command in the project's root folder:

                    Example
                
php src/scraper.php

Copied!

Your PHP setup is ready!

Step 2: Get the Source HTML With Puppeteer

First, initialize a Puppeteer instance and launch it to open a controllable Chromium window:

                    scraper.php
                
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

Copied!

Note

By default, PHP Puppeteer launches Chromium in headless mode. Set the headless option to false if you want to follow what your script does on the target pages in the browser. It’s particularly useful for debugging!

You can now initialize a new page and the goto() method to visit the target page:

                    scraper.php
                
$page = $browser->newPage();
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/');

Copied!

Then, use the content() method to retrieve the HTML source code of the page. Print it in the terminal with echo:

                    scraper.php
                
$html = $page->content();
echo $html;

Copied!

Don't forget to release the browser resources with this line at the end of your script:

                    scraper.php
                
$browser->close();

Copied!

This is what scraper.php should contain at this point:

                    scraper.php
                
<?php

require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;

// open a new Chromium browser window
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

// open a new page in the browser
$page = $browser->newPage();
// visit the target page
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/');

// retrieve the source HTML code of the page and
// print it
$html = $page->content();
echo $html;

// release the browser resources
$browser->close();

  
  

  
Copied!

Run the script in headed mode. The Puppeteer PHP library will open a Chromium window and visit the Infinite Scrolling demo page:

Infinite Scroll Demo — Click to open the image in full screen

The PHP script will also print the following HTML in the terminal:

                    Output
                
<html class="h-full"><head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="description" content="Learn to scrape infinite scrolling pages"><title>Scraping Infinite Scrolling Pages (Ajax) | ScrapingClub</title>
  <link rel="icon" href="/static/img/icon.611132651e39.png" type="image/png">
  <!-- Omitted for brevity... -->

Copied!

Here we go! That's the HTML code of the target page. In the next step, you'll see how to extract data from it.

Step 3: Extract the Data You Want

Puppeteer parses the HTML content of the page and provides an API to extract data from it.

Let's assume the goal of your PHP scraping script is to collect the name and price of each product element on the page. Here's what you need to do:

Select the products by applying an effective HTML node selection strategy.
Extract the desired information from each of them.
Store the scraped data in a PHP array.

PuPHPeteer supports both XPath expressions and CSS Selectors, the two most popular node selection strategies for getting elements from the DOM. CSS selectors are easy to use and intuitive, while XPath expressions are more flexible but complex.

For a complete comparison, read our guide on CSS Selector vs XPath.

To keep things simple, let's pick CSS selectors. To define the right ones, you need to review the HTML code of a product node. So, open the target site in your browser and inspect it with the DevTools:

Inspect Element — Click to open the image in full screen

Expand the HTML code and notice how each product:

Is a <div> element with a post class.
Contains the name in an <h4> and the price in an <h5>.

Since the page contains many products, initialize a $products array for the scraped data:

                    scraper.php
                
$products = [];

Copied!

Next, use the querySelectorAll() method and select the HTML product nodes. This will apply a CSS selector on the page:

                    scraper.php
                
$product_elements = $page->querySelectorAll('.post');

Copied!

Iterate over them and apply the data extraction logic. Retrieve the data of interest, create new objects, and use them to populate $products:

                    scraper.php
                
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->querySelector('h4');
  $price_element = $product_element->querySelector('h5');

  // retrieve the data of interest
  $name = $name_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));
  $price = $price_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));

  // create a new product object and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

  
  

  
Copied!

To extract data from a node with the Puppeteer PHP package, you must write a custom JsFunction. Pass it to the evaluate() method to apply it to the given node.

Note

The above snippet requires the following import: use Nesk\\Rialto\\Data\\JsFunction;

Log all scraped products with:

                    scraper.php
                
print_r($products);

Copied!

Your PHP Puppeteer scraper.php script will now contain:

                    scraper.php
                
<?php

require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

// open a new Chromium browser window
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

// open a new page in the browser
$page = $browser->newPage();
// visit the target page
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/');

// where to store the scraped data
$products = [];

// select all product nodes on the page
$product_elements = $page->querySelectorAll('.post');

// iterate over the product elements and
// apply the scraping logic
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->querySelector('h4');
  $price_element = $product_element->querySelector('h5');

  // retrieve the data of interest
  $name = $name_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));
  $price = $price_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));

  // create a new product object and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// print the scraped data
print_r($products);

// release the browser resources
$browser->close();

  
  

  
Copied!

Execute the PHP script, and it'll produce the output below in the terminal:

                    Output
                
Array
(
    [0] => Array
        (
            [name] => Short Dress
            [price] => $24.99
        )

    // omitted for brevity...

    [9] => Array
        (
            [name] => Fitted Dress
            [price] => $34.99
        )

)

  
  

  
Copied!

Awesome! The PHP parsing logic works as intended. All that remains is to export the collected data to a better format, such as CSV.

Step 4: Export Data as CSV

The PHP standard API provides everything you need to export the scraped data to an output CSV file. Use fopen() to create a products.csv file, and then populate it with it fputcsv(). This command will convert a product object array to a CSV record and append it to the CSV file.

                    scraper.php
                
// open the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

  
  

  
Copied!

Integrate the above logic into scraper.php, and you'll get:

                    scraper.php
                
<?php

require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

// open a new Chromium browser window
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

// open a new page in the browser
$page = $browser->newPage();
// visit the target page
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/');

// where to store the scraped data
$products = [];

// select all product nodes on the page
$product_elements = $page->querySelectorAll('.post');

// iterate over the product elements and
// apply the scraping logic
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->querySelector('h4');
  $price_element = $product_element->querySelector('h5');

  // retrieve the data of interest
  $name = $name_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));
  $price = $price_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));

  // create a new product object and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// open the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

// release the browser resources
$browser->close();

  
  

  
Copied!

Launch the Puppeteer PHP scraping script:

                    Terminal
                
php src/scraper.php

Copied!

At the end of the script execution, a products.csv file will appear in the root folder of your project. Open it, and you'll see these records:

Wonderful! You now know the basics of using Puppeteer with PHP.

As you can see, the current output includes only ten records. That's because the initial view of the page only has a few products and relies on infinite scrolling to load more.

In the next section, you'll learn how to deal with infinite scrolling and extract data from all products on the site.

Interactions With Web Pages via Browser Automation

PuPHPeteer can simulate many web interactions, including waits, clicks, and more. You need them to interact with dynamic content web pages like a human user. Mimicking human behavior will also help your script avoid anti-bots.

The interactions supported by the Puppeteer PHP library include:

Clicking elements.
Moving the mouse cursor.
Waiting for elements on the page to be present, visible, and hidden.
Typing characters in the input fields.
Submitting forms.
Taking screenshots.

Most of those operations are available via the library's built-in methods. For other interactions, use evaluate() to execute JavaScript code directly on the page. These two approaches cover any user interaction.

Time to learn how to scrape all product data from the infinite scroll demo page, and then simulate other popular interactions!

Scrolling

The target page contains only ten products after the first load and loads more as the user reaches the end of the viewport.

Puppeteer doesn't come with a built-in scrolling method. You need a custom JavaScript script to simulate the scrolling interaction and load new products.

This JavaScript snippet tells the browser to scroll down the page ten times at an interval of 0.5 seconds each:

                    scraper.php
                
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === scrolls) {
    clearInterval(scrollInterval)
  }
}, 500)

  
  

  
Copied!

Store the above script in a multi-line string variable. Then, use it to initialize a JSFunction and feed it to the evalutate() method of $page:

                    scraper.php
                
$scrolling_script = <<<EOD
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)
EOD;

$scrolling_js_function = (new JsFunction())->body($scrolling_script);
$page->evaluate($scrolling_js_function);

  
  

  
Copied!

Note

Place the evaluate() instruction before the node selection logic to ensure that the DOM contains all products.

Puppeteer will now instruct Chromium to scroll down the page. However, retrieving new products and rendering them takes time. Wait for the scrolling and data loading operation to end with a sleep() instruction. Stop the script execution for 10 seconds:

                    scraper.php
                
sleep(10);

Copied!

Here's what the complete code looks like:

                    scraper.php
                
<?php

require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

// open a new Chromium browser window
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

// open a new page in the browser
$page = $browser->newPage();
// visit the target page
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/');

// JS script to simulate the infinite scrolling interaction
$scrolling_script = <<<EOD
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)
EOD;

// execute the JS script on the page
$scrolling_js_function = (new JsFunction())->body($scrolling_script);
$page->evaluate($scrolling_js_function);

// wait 10 seconds for the product nodes to load
sleep(10);

// where to store the scraped data
$products = [];

// select all product nodes on the page
$product_elements = $page->querySelectorAll('.post');

// iterate over the product elements and
// apply the scraping logic
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->querySelector('h4');
  $price_element = $product_element->querySelector('h5');

  // retrieve the data of interest
  $name = $name_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));
  $price = $price_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));

  // create a new product object and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// open the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

// release the browser resources
$browser->close();

  
  

  
Copied!

Execute the script to verify that it retrieves all 60 products:

                    scraper.php
                
php src/scraper.php

Copied!

The products.csv file will contain more than 10 records:

Updated CSV File — Click to open the image in full screen

Mission complete! You've just scraped all products from the page using the Puppeteer PHP package.

Wait for Element

Currently, the PHP Puppeteer script depends on a hard wait to let all the new products appear on the page. However, this approach is ineffective for two reasons:

It introduces flakiness into your scraping logic, making it vulnerable to network or browser slowdowns.
It slows down your script by stopping the execution for a fixed number of seconds.

You should opt for smart waits, e.g., waiting for a specific element to be present on the DOM. It’s the best practice for building robust, consistent, reliable browser automation scrapers.

PuPHPeteer provides the waitForSelector() to wait for a node to be on the page. Use it to wait up to 10 seconds for the 60th product to appear:

                    scraper.php
                
$page->waitForSelector('.post:nth-child(60)', ['timeout' => 10000]);

Copied!

Replace the sleep() instruction with this line. The script will now wait for the product nodes to be rendered after the AJAX calls triggered by the scrolls.

The final scraping logic goes as follows:

                    scraper.php
                
<?php
require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

// open a new Chromium browser window
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

// open a new page in the browser
$page = $browser->newPage();
// visit the target page
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/');

// JS script to simulate the infinite scrolling interaction
$scrolling_script = <<<EOD
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)
EOD;

// execute the JS script on the page
$scrolling_js_function = (new JsFunction())->body($scrolling_script);
$page->evaluate($scrolling_js_function);

// wait up to 10 seconds for the 60th product to be on the page
$page->waitForSelector('.post:nth-child(60)', ['timeout' => 10000]);

// where to store the scraped data
$products = [];

// select all product nodes on the page
$product_elements = $page->querySelectorAll('.post');

// iterate over the product elements and
// apply the scraping logic
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->querySelector('h4');
  $price_element = $product_element->querySelector('h5');

  // retrieve the data of interest
  $name = $name_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));
  $price = $price_element->evaluate(JsFunction::createWithParameters(['node'])->body('return node.innerText;'));

  // create a new product object and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// open the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

// release the browser resources
$browser->close();

  
  

  
Copied!

Run it. You'll get the same results as before much faster. That's because the script will now wait only for the right amount of time, reducing the idle period.

Wait for Page to Load

By default, the goto() method waits for the page to fire the load event in the browser. To change that behavior, you can pass a special option array to goto() :

                    scraper.php
                
$page->goto('https://scrapingclub.com/exercise/list_infinite_scroll/', ['waitUntil' => 'load']);

Copied!

The possible values for waitUntil are:

'load': Waits for the load event.
'domcontentloaded': Waits for the DOMContentLoaded event.
'networkidle0': Waits until there are no more than 0 network connections for at least 500 ms.
'networkidle2': Waits until there are no more than two network connections for at least 500 ms.

If you want to wait for the page to navigate to a new URL or to reload, use the waitForNavigation() method. It's useful when your interaction logic triggers a page change:

                    scraper.php
                
$page->waitForNavigation();

Copied!

Note

waitForNavigation() accepts an optional option array with the same waitUntil attribute as before.

The problem is that modern web pages are extremely dynamic, so it's hard to tell when a page has fully loaded. To deal with more complex scenarios, use the waitForSelector() method. It accepts an optional array with the following attributes:

hidden: Wait for the selected element to be absent in the DOM. Default value: false.
signal: A signal for the object to cancel the waitForSelector() call.
timeout: Maximum wait time in milliseconds. Default value: 30,000 (30 seconds).
visible: Wait for the selected element to be visible and present in the DOM. Default: false.

Click Elements

The element objects from the Puppeteer PHP library expose the click() method. Call it to simulate a click interaction on a given element:

                    scraper.py
                
$element->click();

Copied!

This function instructs Chromium to click on the specified node. The browser will send a mouse click event and call the onclick() callback.

When the click() call triggers a page change (like in the snippet below), you have to wait for the new page to load. Then, write some parsing logic on the new DOM structure:

                    scraper.php
                
// select a product element and click it
$product_element = $page->querySelector('.post');
$product_element->click();

// wait for the new page to load
$page->waitForNavigation();

// you are now on the detail product page...
    
// new scraping logic...

// $page->querySelectorAll(...);

Copied!

Take a Screenshot

Scraping textual data isn't the only way to get useful information from a site. Screenshots of specific pages or DOM elements often come in handy, for example, for competitor research or testing purposes.

PHP Puppeteer includes the screenshot() method to take a screenshot of the current viewport:

                    scraper.py
                
// take a screenshot of the current viewport
$page->screenshot(['path' => 'screenshot.png']);

Copied!

It’ll generate a screenshot.png file in your project's root folder.

Similarly, you can call the screenshot() method on a single element:

                    scraper.php
                
$product_elements[0]->screenshot(['path' => 'product.png']);

Copied!

It’ll produce a product.png file with a screenshot of the selected element.

Good job! You're now a master of user interactions in Puppeteer for PHP.

Avoid Getting Blocked When Scraping With Puppeteer in PHP

Getting blocked by anti-bot solutions is the biggest challenge to web scraping with Puppeteer. The protection systems are able to tell whether incoming requests are made by a human user or a bot, such as your script.

To avoid blocks, you must make your requests seem more natural to the target server. Two useful techniques to achieve that goal are:

Setting real-world User Agent header
Using a proxy to change your exit IP.

If you’d like to explore other approaches, read our guide on web scraping without getting blocked.

Let’s start with the User Agent. Customize the User Agent in PHP Puppeteer via the Chromium \--user-agent flag:

                    scraper.php
                
$custom_user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36';
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'args' => ['--user-agent=$custom_user_agent'],
  // other options...
]);

Copied!

Find out more about this method in our guide on User Agents for web scraping.

Setting a proxy follows a similar pattern and depends on the \--proxy-server flag. Get the URL of a free proxy from sites like Free Proxy List, and then pass it to Chrmoium as follows:

                    scraper.php
                
$proxy_url = '233.67.4.11:6879';
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'args' => ['--proxy-server=$proxy_url'],
  // other options...
]);

Copied!

Note

By the time you read this guide, the chosen proxy server will no longer work. That's because free proxies are data-greedy, short-lived, and unreliable. Use them only for learning purposes and never in production.

Unfortunately, these approaches are just baby steps to elude anti-bot solutions. Advanced systems like Cloudflare, DataDome, Akamai, and PerimeterX will still be able to detect your script as a bot. Here's the result of trying to scrape a heavily protected website, such as G2:

G2 human verification page — Click to open the image in full screen

To deal with advanced anti-bot solutions, you need a more advanced toolkit, such as a web scraping API. An example of such software is ZenRows. As a next-generation scraping API, it integrates with Puppeteer PHP to rotate your User Agent, add IP rotation capabilities, and extend it with the best anti-bot solutions so you'll never get blocked again.

Additionally, ZenRows provides user interaction capabilities, so it can fully replace Puppeteer!

But first, let's see how ZenRows ingrates with Puppeteer. Sign up for free to redeem up to 1,000 free URLs and reach the Request Builder page.

building a scraper with zenrows — Click to open the image in full screen

Assuming you want to scrape the Cloudflare-protected page seen earlier, follow these steps:

Paste your target URL (https://www.g2.com/products/airtable/reviews) in the "URL to Scrape" field.
Enable JS Rendering (User Agent rotation and the anti-bot bypass tools are included by default).
Enable the rotating IPs by checking the "Premium Proxy" option.
On the right side of the screen, press the "cURL" button and select the “API” option.

Copy the generated URL and pass it to the goto() method:

                    scraper.php
                
<?php
require_once ('vendor/autoload.php');

use Nesk\Puphpeteer\Puppeteer;

// open a new Chromium browser window
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch([
  'headless' => true, // set to false while developing locally
]);

// open a new page in the browser
$page = $browser->newPage();
// visit the target page
$page->goto('https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fairtable%2Freviews&js_render=true&premium_proxy=true');

// retrieve the source HTML code of the page and
// print it
$html = $page->content();
echo $html;

// release the browser resources
$browser->close();

  
  

  
Copied!

Execute the script, and it'll print the source HTML code of the G2.com page:

                    Output
                
<!DOCTYPE html>
<head>
  <meta charset="utf-8" />
  <link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
  <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>
  <!-- omitted for brevity ... -->

Copied!

Congratulations! You just integrated ZenRows into the PHP Puppeteer library.

Conclusion

In this tutorial, you've learned the fundamentals of controlling Chromium in PHP. You explored the basics of Puppeteer and then dived into more advanced techniques. You've become a PHP browser automation expert!

But no matter how complex your browser automation is, anti-bot measures can still stop it. Avoid them with ZenRows, a full-featured web scraping API with browser automation capabilities, IP rotation, and the most powerful anti-scraping bypass toolkit. Scraping dynamic content web pages has never been easier. Try ZenRows for free!

Why Use Puppeteer With PHP?

How to Scrape With Puppeteer in PHP?

Step 1: Download PuPHPeteer

Step 2: Get the Source HTML With Puppeteer

Step 3: Extract the Data You Want

Step 4: Export Data as CSV

Interactions With Web Pages via Browser Automation

Scrolling

Wait for Element

Wait for Page to Load

Click Elements

Take a Screenshot

Avoid Getting Blocked When Scraping With Puppeteer in PHP

Conclusion

Ready to get started?