The Anti-bot Solution to Scrape Everything? Get Your Free API Key! 😎

Selenium in PHP for Web Scraping: Tutorial 2024

February 7, 2024 · 8 min read

Selenium is the favorite browser automation tool for web scraping and testing. While Selenium for PHP isn't officially supported, the community created the php-webdriver port.

In this guide, you'll learn the basics and then explore more complex interactions.

  • How to use PHP Selenium.
  • Interact with web pages in a browser.
  • Avoid getting blocked.

Let's dive in!

Why You Should Use Selenium in PHP

Selenium is one of the most popular headless browser libraries due to its consistent API, which opens the doors to multi-platform and multi-language browser automation. That makes it an ideal tool for testing and web scraping tasks.

The tool is so popular that Facebook started a PHP port called php-webdriver. This project is now carried out by the open-source community, which works hard to keep it up-to-date.

How to Use Selenium in PHP

Move your first steps with Selenium in PHP and learn how to scrape this infinite scrolling demo page:

demo page
Click to open the image in full screen

This page uses JavaScript to dynamically load new products as the user scrolls down. That’s a great example of a dynamic page that needs browser automation for scraping, because you couldn't interact with it without a tool like the Selenium PHP library.

Time to retrieve some data from it!

Step 1: Install Selenium in PHP

Before getting started, you need PHP and Composer installed on your computer. Follow the two links to get instructions on how to set up the requirements.

You’re ready to initialize a PHP Composer project. Create a php-selenium-project folder and enter it in the terminal:

Terminal
mkdir php-selenium-project
cd php-selenium-project

Next, execute the init command to create a new Composer project inside it. Follow the wizard and answer the questions as required:

Terminal
composer init

Add php-webdriver to the project's dependencies:

Terminal
composer require php-webdriver/webdriver

This will take a while, so be patient.

To work, the package requires the Selenium standalone server running in the background. Make sure you have Java 8+ installed on your PC, download Selenium Grid executable, and launch it:

Terminal
java -jar selenium-server-<version>.jar standalone --selenium-manager true

Replace <version> with the version of the Selenium Grid .jar file you just downloaded.

The above command should produce something similar to the output below. The last message informs you that the Selenium server is running locally on port 4444:

Output
INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
INFO [NodeOptions.getSessionFactories] - Detected 8 available processors
INFO [NodeOptions.report] - Adding Firefox for {"browserName": "firefox","platformName": "Windows 11"} 8 times
INFO [NodeOptions.report] - Adding Chrome for {"browserName": "chrome","platformName": "Windows 11"} 8 times
INFO [NodeOptions.report] - Adding Edge for {"browserName": "MicrosoftEdge","platformName": "Windows 11"} 8 times
INFO [NodeOptions.report] - Adding Internet Explorer for {"browserName": "internet explorer","platformName": "Windows 11"} 1 times
INFO [Node.<init>] - Binding additional locator mechanisms: relative
INFO [GridModel.setAvailability] - Switching Node 8562f610-78a1-49b3-946c-688f53b66fe9 (uri: http://192.168.1.30:4444) from DOWN to UP
INFO [LocalDistributor.add] - Added node 8562f610-78a1-49b3-946c-688f53b66fe9 at http://192.168.1.30:4444. Health check every 120s
INFO [Standalone.execute] - Started Selenium Standalone 4.16.1 (revision 9b4c83354e): http://192.168.1.30:4444

Perfect! You now have all in place to build a Selenium script in PHP.

Create a scraper.php file in the /src folder of the Compose project folder and initialize it with the code below. The first lines contain imports for using Selenium with PHP. Then, there’s the line to require php-webdriver via Composer:

scraper.php
namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;

require_once('vendor/autoload.php');

// scraping logic

You can run the PHP Selenium script with this command:

Terminal
php src/scraper.php

Awesome, your Selenium PHP setup is ready!

Step 2: Scrape with Selenium in PHP

Use the lines below to initialize a Chrome driver to control a local instance of Chrome:

scraper.php
// the URL to the local Selenium Server
$host = 'http://localhost:4444/';

// to control a Chrome instance
$capabilities = DesiredCapabilities::chrome();

// define the browser options
$chromeOptions = new ChromeOptions();
// to run Chrome in headless mode
$chromeOptions->addArguments(['--headless']); // <- comment out for testing

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);

// initialize a driver to control a Chrome instance
$driver = RemoteWebDriver::create($host, $capabilities);

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

Don't forget to release the web driver resources by adding this line to the end of the script:

scraper.php
$driver->close();

Next, use the get() method from $driver to connect to the target page:

scraper.php
$driver->get('https://scrapingclub.com/exercise/list_infinite_scroll/');

Then, retrieve the raw HTML from the page and print it. Use the getPageSource() method of the PHP WebDriver object to get the current page's source. Log it in the terminal with echo:

scraper.php
$html = $driver->getPageSource();
echo $html;

Here’s what scraper.php contains so far:

scraper.php
namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;

require_once('vendor/autoload.php');

// the URL to the local Selenium Server
$host = 'http://localhost:4444/';

// to control a Chrome instance
$capabilities = DesiredCapabilities::chrome();

// define the browser options
$chromeOptions = new ChromeOptions();
// to run Chrome in headless mode
$chromeOptions->addArguments(['--headless']); // <- comment out for testing

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);

// initialize a driver to control a Chrome instance
$driver = RemoteWebDriver::create($host, $capabilities);

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

// open the target page in a new tab
$driver->get('https://scrapingclub.com/exercise/list_infinite_scroll/');

// extract the HTML page source and print it
$html = $driver->getPageSource();
echo $html;

// close the driver and release its resources
$driver->close();

Execute the PHP script in headed mode. The PHP Selenium package will open Chrome and visit the Infinite Scrolling demo page:

demo page from selenium package
Click to open the image in full screen
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The script will also print the HTML below in the terminal:

Output
<html class="h-full"><head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="description" content="Learn to scrape infinite scrolling pages"><title>Scraping Infinite Scrolling Pages (Ajax) | ScrapingClub</title>
  <link rel="icon" href="/static/img/icon.611132651e39.png" type="image/png">
  <!-- Omitted for brevity... -->

Great! That's exactly the HTML code of the target page!

Step 3: Parse the Data You Want

Selenium enables you to parse the HTML content of the page to extract specific data from it. Now, suppose the goal of your PHP scraper is to get the name and price of each product on the page. To achieve that, you have to:

  1. Select the products on the page by applying an effective node selection strategy.
  2. Collect the desired data from each of them.
  3. Store the scraped data in a PHP array.

A node selection strategy usually relies on XPath expressions or CSS Selectors. php-webdriver supports both, giving you more options to find elements in the DOM. However, CSS selectors are intuitive, while XPath expressions may appear a bit more complex. Find out more in our guide on CSS Selector vs XPath.

Let's keep things simple and opt for CSS selectors. To figure out how to define the right ones for your goal, you need to analyze a product HTML node. Open the target site in your browser, right-click on a product element, and inspect it with the DevTools:

demo page inspection with chrome dev tools
Click to open the image in full screen

Each product has a post class and contains the name in an <h4> and the price in an <h5>.

Follow the instructions below and see how to extract the name and price from the products on the page.

Initialize a $products array to keep track of the scraped data:

scraper.php
$products = [];

Use the findElements() method to select the HTML product nodes. WebDriverBy::cssSelector() defines a CSS selector strategy for PHP Selenium:

scraper.php
$product_elements = $driver->findElement(WebDriverBy::cssSelector('post'));

After selecting the product nodes, iterate over them and apply the data extraction logic:

scraper.php
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->findElement(WebDriverBy::cssSelector('h4'));
  $price_element = $product_element->findElement(WebDriverBy::cssSelector('h5'));

  // retrieve the data of interest
  $name = $name_element->getText();
  $price = $price_element->getText();

  // create a new product array and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

To extract the price and name from each product, use the getText() method. That’ll return the inner text of the selected element.

This is what your scraper.php file should now contain:

scraper.php
    namespace Facebook\WebDriver;
    
    use Facebook\WebDriver\Remote\DesiredCapabilities;
    use Facebook\WebDriver\Remote\RemoteWebDriver;
    use Facebook\WebDriver\Chrome\ChromeOptions;
    use Facebook\WebDriver\WebDriverBy;
    
    require_once('vendor/autoload.php');
    
    // the URL to the local Selenium Server
    $host = 'http://localhost:4444/';
    
    // to control a Chrome instance
    $capabilities = DesiredCapabilities::chrome();
    
    // define the browser options
    $chromeOptions = new ChromeOptions();
    // to run Chrome in headless mode
    $chromeOptions->addArguments(['--headless']); // <- comment out for testing
    
    // register the Chrome options
    $capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);
    
    // initialize a driver to control a Chrome instance
    $driver = RemoteWebDriver::create($host, $capabilities);
    
    // maximize the window to avoid responsive rendering
    $driver->manage()->window()->maximize();
    
    // open the target page in a new tab
    $driver->get('https://scrapingclub.com/exercise/list_infinite_scroll/');
    
    // to keep track of the scraped products
    $products = [];
    
    // select the product elements
    $product_elements = $driver->findElements(WebDriverBy::cssSelector('.post'));
    
    // iterate over the product nodes
    // and extract data from them
    foreach ($product_elements as $product_element) {
      // select the name and price elements
      $name_element = $product_element->findElement(WebDriverBy::cssSelector('h4'));
      $price_element = $product_element->findElement(WebDriverBy::cssSelector('h5'));
    
      // retrieve the data of interest
      $name = $name_element->getText();
      $price = $price_element->getText();
    
      // create a new product array and add it to the list
      $product = ['name' => $name, 'price' => $price];
      $products[] = $product;
    }
    
    // print all products
    print_r($products);
    
    // close the driver and release its resources
    $driver->close();
    
    ?>

Launch the above Selenium PHP script, and it'll print the output below in the terminal:

    Array
    (
        [0] => Array
            (
                [name] => Short Dress
                [price] => $24.99
            )
    
        // omitted for brevity...
    
        [9] => Array
            (
                [name] => Fitted Dress
                [price] => $34.99
            )
    
    )

Launch the above Selenium PHP script, and it'll print the output below in the terminal:

Output
Array
(
    [0] => Array
        (
            [name] => Short Dress
            [price] => $24.99
        )

    // omitted for brevity...

    [9] => Array
        (
            [name] => Fitted Dress
            [price] => $34.99
        )

)

Great! The PHP parsing logic works like a charm.

Step 4: Export Data as CSV

Use the logic below to export the scraped data to an output CSV file. Use fopen() from the PHP standard library to create a products.csv file and populate it with it fputcsv(). That will convert each product array to a CSV record and append it to the CSV file.

scraper.php
// create the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

See your final Selenium PHP scraping script:

scraper.php
namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;
use Facebook\WebDriver\WebDriverBy;

require_once('vendor/autoload.php');

// the URL to the local Selenium Server
$host = 'http://localhost:4444/';

// to control a Chrome instance
$capabilities = DesiredCapabilities::chrome();

// define the browser options
$chromeOptions = new ChromeOptions();
// to run Chrome in headless mode
$chromeOptions->addArguments(['--headless']); // <- comment out for testing

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);

// initialize a driver to control a Chrome instance
$driver = RemoteWebDriver::create($host, $capabilities);

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

// open the target page in a new tab
$driver->get('https://scrapingclub.com/exercise/list_infinite_scroll/');

// to keep track of the scraped products
$products = [];

// select the product elements
$product_elements = $driver->findElements(WebDriverBy::cssSelector('.post'));

// iterate over the product nodes
// and extract data from them
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->findElement(WebDriverBy::cssSelector('h4'));
  $price_element = $product_element->findElement(WebDriverBy::cssSelector('h5'));

  // retrieve the data of interest
  $name = $name_element->getText();
  $price = $price_element->getText();

  // create a new product array and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// create the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

// close the driver and release its resources
$driver->close();

And launch it:

Terminal
php src/scraper.php

After execution is complete, a products.csv file will appear in the root folder of your project. Open it, and you'll see that it contains the following data:

Products CSV File.
Click to open the image in full screen

Wonderful! You now know the basics of Selenium with PHP!

However, it's essential to acknowledge that the current output involves only ten items. Why? Because the page initially has only those products. To fetch more, it relies on infinite scrolling. Read the next section to acquire the skills for extracting data from all products on the site with Selenium in PHP.

Interacting with Web Pages in a Browser with PHP WebDriver

The php-webdriver library can simulate several web interactions, including scrolls, waits, mouse movements, and more. That’s key to interacting with dynamic content pages like a human user would. Browser automation also helps your script avoid triggering anti-bot measures.

The interactions supported by the Selenium PHP WebDriver library include:

  • Click elements and move the mouse.
  • Wait for elements on the page to be present, visible, clickable, etc.
  • Fill out and empty input fields.
  • Scroll up and down the page.
  • Submit forms.
  • Take screenshots.
  • Drag and drop elements.

You can perform most of those operations with the methods offered by the library. Otherwise, use the executeScript() method to run a JavaScript script directly on the page. With either tool, any user interaction in the browser becomes possible.

Let's learn how to scrape all product data from the infinite scroll demo page and then see other popular PHP Selenium interactions!

Scrolling

Initially, the target page has only ten products and uses infinite scrolling to load new ones. Bear in mind that Selenium doesn't come with a built-in method for scrolling. Thus, you need custom JavaScript logic to simulate the scrolling interaction.

This JavaScript snippet tells the browser to scroll down the page 10 times at an interval of 0.5 seconds each:

scraper.js
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)

Store the above script in a variable and feed it to the executeScript() method as below:

scraper.php
$scrolling_script = <<<EOD
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)
EOD;

$driver->executeScript($scrolling_script);

Selenium will now scroll down the page, but that’s not enough. You also need to wait for the scrolling and data loading operation to end. To do so, use sleep() to stop the script execution for 10 seconds:

scraper.php
sleep(10);

Here's your new complete code:

scraper.php
namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;
use Facebook\WebDriver\WebDriverBy;

require_once('vendor/autoload.php');

// the URL to the local Selenium Server
$host = 'http://localhost:4444/';

// to control a Chrome instance
$capabilities = DesiredCapabilities::chrome();

// define the browser options
$chromeOptions = new ChromeOptions();
// to run Chrome in headless mode
$chromeOptions->addArguments(['--headless']); // <- comment out for testing

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);

// initialize a driver to control a Chrome instance
$driver = RemoteWebDriver::create($host, $capabilities);

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

// open the target page in a new tab
$driver->get('https://scrapingclub.com/exercise/list_infinite_scroll/');

 // simulate the infinite scrolling interaction
$scrolling_script = <<<EOD
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)
EOD;

$driver->executeScript($scrolling_script);

// wait 10 seconds for the new products to load
sleep(10);

// to keep track of the scraped products
$products = [];

// select the product elements
$product_elements = $driver->findElements(WebDriverBy::cssSelector('.post'));

// iterate over the product nodes
// and extract data from them
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->findElement(WebDriverBy::cssSelector('h4'));
  $price_element = $product_element->findElement(WebDriverBy::cssSelector('h5'));

  // retrieve the data of interest
  $name = $name_element->getText();
  $price = $price_element->getText();

  // create a new product array and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// create the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

// close the driver and release its resources
$driver->close();

Launch the script to verify that the script stores all 60 products now:

Terminal
php src/scraper.php

The products.csv file will now contain more than the first ten items:

Updated Products.csv File Screenshot
Click to open the image in full screen

Mission complete! You just scraped all products from the page. 🎉

Wait for Element

The current Selenium PHP script depends on a hard wait. That's a discouraged practice since it introduces unreliability into the scraping logic, making the scraper vulnerable to failures in case of network or browser slowdowns.

Employing a generic time-based wait isn't a trustworthy approach, so you should instead opt for smart waits, like waiting for a specific node to be present on the page. This best practice is crucial for building robust, consistent, reliable scrapers.

php-webdriver provides the presenceOfElementLocated() method to verify if a node is on the page. Use it in the wait() logic to wait up to ten seconds for the 60th product to appear:

scraper.php
$driver->wait(10)->until(
  WebDriverExpectedCondition::visibilityOfElementLocated(WebDriverBy::cssSelector('.post:nth-child(60)'))
);

Replace the sleep() instruction with that logic, and the PHP script will now wait for the products to be rendered after the AJAX calls triggered by the scrolls.

The definitive scraper is:

scraper.php
namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;

require_once('vendor/autoload.php');

// the URL to the local Selenium Server
$host = 'http://localhost:4444/';

// to control a Chrome instance
$capabilities = DesiredCapabilities::chrome();

// define the browser options
$chromeOptions = new ChromeOptions();
// to run Chrome in headless mode
$chromeOptions->addArguments(['--headless']); // <- comment out for testing

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);

// initialize a driver to control a Chrome instance
$driver = RemoteWebDriver::create($host, $capabilities);

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

// open the target page in a new tab
$driver->get('https://scrapingclub.com/exercise/list_infinite_scroll/');

 // simulate the infinite scrolling interaction
$scrolling_script = <<<EOD
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)
EOD;

$driver->executeScript($scrolling_script);

// wait up to 10 seconds for the 60th product to be
// on the page
$driver->wait(10)->until(
  WebDriverExpectedCondition::visibilityOfElementLocated(WebDriverBy::cssSelector('.post:nth-child(60)'))
);

// to keep track of the scraped products
$products = [];

// select the product elements
$product_elements = $driver->findElements(WebDriverBy::cssSelector('.post'));

// iterate over the product nodes
// and extract data from them
foreach ($product_elements as $product_element) {
  // select the name and price elements
  $name_element = $product_element->findElement(WebDriverBy::cssSelector('h4'));
  $price_element = $product_element->findElement(WebDriverBy::cssSelector('h5'));

  // retrieve the data of interest
  $name = $name_element->getText();
  $price = $price_element->getText();

  // create a new product array and add it to the list
  $product = ['name' => $name, 'price' => $price];
  $products[] = $product;
}

// create the output CSV file
$csvFilePath = 'products.csv';
$csvFile = fopen($csvFilePath, 'w');

// write the header row
$header = ['name', 'price'];
fputcsv($csvFile, $header);

// add each product to the CSV file
foreach ($products as $product) {
    fputcsv($csvFile, $product);
}

// close the CSV file
fclose($csvFile);

// close the driver and release its resources
$driver->close();

If you execute it, you'll get the same results as before. The main difference is that you’ll get better performance as it's now idle for the right amount of time only.

Wait for the Page to Load

The function $driver->get() automatically waits for the browser to fire the load event on the page. In other words, the PHP Selenium library already waits for pages to load for you.

The problem is that most web pages are now extremely dynamic, which may not be enough as it had to tell when a page has truly fully loaded. To deal with more complex scenarios, use the webdriver expected conditions below:

  • titleIs().
  • titleContains().
  • urlIs().
  • urlContains().
  • presenceOfElementLocated().
  • presenceOfAllElementsLocatedBy().
  • elementTextIs().
  • elementTextContains().
  • textToBePresentInElementValue().
  • elementToBeClickable().

For more information on how to wait in Selenium PHP, check out the documentation.

Click Elements

The WebDriverElement objects from php-webdriver expose the click() method to simulate click interactions:

scraper.php
$element->click()

This function signals Selenium to click on the specified node. The browser will send a mouse click event and call the HTML onclick() callback as a result.

If the click() call triggers a page change (as in the example below), you'll have to adjust the parsing logic to the new DOM structure:

scraper.php
$product_element = $driver->findElement(WebDriverBy::cssSelector('.post'));
$product_element->click();
// you are now on the detail product page...
    
// new scraping logic...

// $driver->findElement(...)

Take a Screenshot

Scraping data from a web page isn't the only way to get useful information from a site. Images of target pages or specific elements are useful too, e.g. to get visual feedback on what competitors are doing.

PHP Selenium offers the takeScreenshot() method to take a screenshot of the current viewport:

scraper.php
// take a screenshot of the current viewport
$driver->takeScreenshot('screenshot.png');

That’ll produce a screenshot.png file in your project's root folder.

Amazing! You're now a master of Selenium PHP WebDriver interactions!

Avoid Getting Blocked When Scraping with Selenium in PHP

The biggest challenge to web scraping is getting blocked by anti-bot solutions, and you’ll need to make your requests seem more natural, which involves techniques such as setting real-world User-Agent header and using proxies to change the exit IP.

To set a custom User Agent in Selenium with PHP, pass it to Chrome's --user-agent flag option:

Example
$custom_user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
$chromeOptions->addArguments([
  "--user-agent=$custom_user_agent"
  // other options...
]);

Learn more in our guide on User Agents for web scraping.

Setting up a PHP proxy requires the --proxy-server flag. Get the URL of a free proxy from a site like Free Proxy List and then add it in Chrome as follows:

Example
$proxy_url = '231.32.1.14:6573';
$chromeOptions->addArguments([
  "--proxy-server=$proxy_url"
  // other options...
]);

Don’t forget that these approaches are just baby steps to bypass anti-bot systems. Advanced solutions like Cloudflare will still be able to detect your Selenium PHP script as a bot.

So, what's the best move? ZenRows! As a scraping API, it seamlessly integrates with Selenium to rotate your User Agent, add IP rotation capabilities, and more.

Try the power of ZenRows with Selenium and redeem your first 1,000 credits by sign up for freeing. You'll get to the Request Builder page below:

ZenRows Request Builder Page
Click to open the image in full screen

Suppose you want to extract data from the protected G2.com page seen earlier.

Paste your target URL (https://www.g2.com/products/airtable/reviews) into the "URL to Scrape" input. Check "Premium Proxy" to get rotating IPs and make sure the "JS Rendering" feature isn’t enable to avoid double rendering.

On the right, choose cURL to get the scraping API URL, copy the generated URL and pass it to the Selenum's get() method:

scraper.php
namespace Facebook\WebDriver;

use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;

require_once('vendor/autoload.php');

// the URL to the local Selenium Server
$host = 'http://localhost:4444/';

// to control a Chrome instance
$capabilities = DesiredCapabilities::chrome();

// define the browser options
$chromeOptions = new ChromeOptions();
// to run Chrome in headless mode
$chromeOptions->addArguments(['--headless']); // <- comment out for testing

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);

// initialize a driver to control a Chrome instance
$driver = RemoteWebDriver::create($host, $capabilities);

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

// open the target page in a new tab
$driver->get('https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fairtable%2Freviews&premium_proxy=true');

// extract the HTML page source and print it
$html = $driver->getPageSource();
echo $html;

// close the driver and release its resources
$driver->close();

Launch it, and it'll print the source HTML of the G2.com page:

Output
<!DOCTYPE html>
<head>
  <meta charset="utf-8" />
  <link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
  <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>
  <!-- omitted for brevity ... -->

Wow! You just integrated ZenRows into the Selenium PHP library.

Now, what about anti-bot measures like CAPTCHAs that can stop your Selenium PHP script? The great news is that ZenRows not only extends Selenium but can replace it completely while equipping with with even better anti-bot bypass superpowers.

As a cloud solution, ZenRows also introduces significant savings compared to the cost of Selenium.

Conclusion

In this guide for using Selenium with PHP for web scraping, you explored the fundamentals of controlling headless Chrome. You learned the basics and then dug into more advanced techniques. You've become a PHP browser automation expert!

Now you know:

  • How to set up a PHP Selenium WebDriver project.
  • How to use it to extract data from a dynamic content page.
  • What user interactions you can simulate in Selenium.
  • The challenges of scraping online data and how to face them.

No matter how complex your browser automation is, anti-bot measures can still block it. Elude them all with ZenRows, a web scraping API with browser automation functionality, IP rotation, and the most powerful anti-scraping bypass available. Scraping dynamic content sites has never been easier. Try ZenRows for free!

Frequent Questions

Does PHP Support Selenium?

Yes, PHP supports Selenium via php-webdriver, the PHP bindings originally developed by Facebook. The library is now maintained by the community, which allows Selenium to work with PHP.

What is the Difference between PHPUnit and Selenium?

PHPUnit is a testing framework for unit testing in PHP. Instead, Selenium is a web testing framework for automating browser interactions. While PHPUnit is for testing individual units of code, Selenium is for end-to-end testing of web applications. Plus, PHP Selenium is a great tool to perform web scraping via browser automation.

Which PHP Language Version Is Supported by Selenium?

php-webdriver, the PHP port of Selenium, requires a version of PHP greater than or equal to 7.3 or greater than or equal to 8.0. In Composer notation, the requirement is php: ^7.3 || ^8.0. At the same time, Selenium itself is not dependent on a specific PHP language version, and the compatibility on Selenium with PHP depends of the binding library chosen.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.