Goutte web scraping is a popular approach to retrieving data from the web in PHP. The library simulates the behavior of a web browser and is now part of Symfony.
This guide will cover the basics of Goutte PHP and then explore more complex techniques. At the end of this tutorial, you'll know how to:
Let's dive in!
Why Use Goutte PHP for Web Scraping?
Goutte is a powerful PHP web scraping and crawling library with thousands of stars on GitHub. The package has become popular thanks to its intuitive browser-like API, which makes it easier to extract data from HTML/XML web pages.
As covered in the docs, Goutte is now deprecated. This doesn't mean the library is no longer used. Quite the opposite, Goutte is a proxy for the HttpBrowser
class from the Symfony BrowserKit
component.
Goutte is part of Symfony and remains one of the most useful libraries for PHP web scraping. In the next section, you’ll learn how to use it.
Prerequisites
Follow the instructions below and set up a PHP environment for web scraping in Goutte.
Create the Project
Before getting started, make sure you meet the following prerequisites:
- PHP >= 8 installed locally.
- Composer installed on your computer.
- A PHP IDE such as WebStorm or Visual Studio Code with the PHP extension.
If you miss any of these components, set it up by clicking the link above and following the wizard.
You now have everything you need to initialize a PHP Composer project. Create a folder for your Goutte web scraping project and enter it in the terminal:
mkdir goutte-scraper
cd goutte-scraper
Then, launch the init
command to create a new Composer project inside it. Follow the wizard and answer the questions as required:
composer init
Perfect! goutte-scraper
now contains a new Composer project.
Add a scraper.php
file in the /src
folder and initialize it with the code below. The first line contains the autoload import required by Composer. Then, there is a simple log instruction:
<?php
require_once("vendor/autoload.php");
echo "Hello, World!";
You can run the PHP script with this command:
php src/scraper.php
That should produce the following output:
"Hello, World!"
Here we go! Your PHP setup is ready.
Install Goutte
Since Goutte is deprecated for standalone usage, you shouldn't install it directly. Instead, you need to use the BrowserKit
and HttpClient
components from Symfony. These two packages wrap Goutte and provide the same experience.
Install them with this Composer command:
composer require symfony/http-client symfony/browser-kit
You can use the BrowserKit
and HttpClient
components as standalone libraries without having to set up an entire Symfony project.
Then, import them in your scraper.php
by adding the following two lines on top of it:
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
Your scraper.php
file is ready to become a Goutte PHP web scraping script:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once("vendor/autoload.php");
// scraping logic...
Learn how Goutte allows you to retrieve data from the Web in the next section.
Tutorial: Your First Web Scraper with Goutte
In this section, you'll use Goutte PHP to extract all product data from an e-commerce website. The target site will be ScrapeMe, a platform with a paginated list of Pokémon products:

Let’s follow the steps below to perform web scraping with Goutte!
Step 1: Get the HTML of Your Target Page
To start scraping a webpage, you need to connect to it and retrieve its HTML by making an HTTP GET request to the target page.
Initialize the Goutte client:
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
Then, connect to the desired page using the request()
method. This returns a Crawler
object that exposes the Goutte API required for web scraping:
$crawler = $browser->request("GET", "https://scrapeme.live/shop/");
Next, access the server response and its HTML content:
// get the response returned by the server
$response = $browser->getResponse();
// extract the HTML content and print it
$html = $response->getContent();
echo $html;
This is what your current scraper.php
file should look like:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once("vendor/autoload.php");
// initialize an HTTP client object"
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// make a GET request to the target site
$crawler = $browser->request("GET", "https://scrapeme.live/shop/");
// get the response returned by the server
$response = $browser->getResponse();
// extract the HTML content and print it
$html = $response->getContent();
echo $html;
The script will produce the following output:
<!doctype html>
<html lang="en-GB">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2.0" />
<link rel="profile" href="http://gmpg.org/xfn/11" />
<link rel="pingback" href="https://scrapeme.live/xmlrpc.php" />
<title>Products – ScrapeMe</title>
Fantastic! Your web scraping Goutte script connects to the target page. Now, get ready to extract some data.
Step 2: Extract Data from One Element
To collect data from HTML elements, you must first isolate them with an effective node selection strategy. To devise it, get familiar with the HTML of the target page.
Visit the target page in the browser and inspect a product HTML node with the DevTools:

Expand the HTML code and analyze it. Note that you can select each product with the CSS selector that follows:
li.product
If you're not familiar with this syntax, li
is the tag of the HTML element and product
is its class.
Given a product HTML element, you can extract:
- The URL from the
<a>
node. - The image URL from the
<img>
node. - The name from the
<h2>
node. - The price from the
<span>
node.
Before implementing the scraping logic, you have to install the CssSelector
Symfony component:
composer require symfony/css-selector
Call the filter()
method on $crawler
to apply a CSS selector on the page. This will return an array of all nodes that match the specified selection strategy. Select a single HTML product element with the eq(0)
method and then use text()
and attr()
to extract data from it:
// select the first product HTML element on the page
$productHTMLElement = $crawler->filter("li.product")->eq(0);
// scraping logic
$url = $productHTMLElement->filter("a")->eq(0)->attr("href");
$image = $productHTMLElement->filter("img")->eq(0)->attr("src");
$name = $productHTMLElement->filter("h2")->eq(0)->text();
$price = $productHTMLElement->filter("span")->eq(0)->text();
You can finally print the scraped data in the terminal with:
echo $url, PHP_EOL;
echo $image, PHP_EOL;
echo $name, PHP_EOL;
echo $price, PHP_EOL;
PHP_EOL
is the platform-independent newline character.
Integrate the above logic in scraper.php
, and you'll get:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once("vendor/autoload.php");
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// make a GET request to the target site
$crawler = $browser->request("GET", "https://scrapeme.live/shop/");
// select the first product HTML element on the page
$productHTMLElement = $crawler->filter("li.product")->eq(0);
// scraping logic
$url = $productHTMLElement->filter("a")->eq(0)->attr("href");
$image = $productHTMLElement->filter("img")->eq(0)->attr("src");
$name = $productHTMLElement->filter("h2")->eq(0)->text();
$price = $productHTMLElement->filter("span")->eq(0)->text();
// log the scraped data
echo $url, PHP_EOL;
echo $image, PHP_EOL;
echo $name, PHP_EOL;
echo $price, PHP_EOL;
Launch the Goutte web scraping script, and it'll produce:
https://scrapeme.live/shop/Bulbasaur/
https://scrapeme.live/wp-content/uploads/2018/08/001-350x350.png
Bulbasaur
£63.00
Awesome! You’ve just retrieved the data of interest from a single product HTML node on the page. Learn how to scrape all products in the next section.
Step 3: Extract Data from Multiple Elements
The target web page contains multiple products, not just one. Initialize a new array to store all products:
$products = []
At the end of the script, this array will store all scraped data objects.
Now, remove eq(0)
from the first filter()
instruction and use each()
to iterate over all products. For each product node, apply the scraping logic, instantiate a new object, and add it to the list:
$crawler->filter("li.product")->each(function ($productHTMLElement) use (&$products) {
// scraping logic
$url = $productHTMLElement->filter("a")->eq(0)->attr("href");
$image = $productHTMLElement->filter("img")->eq(0)->attr("src");
$name = $productHTMLElement->filter("h2")->eq(0)->text();
$price = $productHTMLElement->filter("span")->eq(0)->text();
// instantiate a new product object
$product = [
"url" => $url,
"image" => $image,
"name" => $name,
"price" => $price
];
// add it to the list
$products[] = $product;
});
Verify that the web scraping Goutte logic above works with this log:
print_r($products);
The scraper.php
script should currently contain:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once ("vendor/autoload.php");
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// make a GET request to the target site
$crawler = $browser->request("GET", "https://scrapeme.live/shop/");
// where to store the scraped data
$products = [];
// select all product HTML elements on the page,
// iterate over them, and scrape them
$crawler->filter("li.product")->each(function ($productHTMLElement) use (&$products) {
// scraping logic
$url = $productHTMLElement->filter("a")->eq(0)->attr("href");
$image = $productHTMLElement->filter("img")->eq(0)->attr("src");
$name = $productHTMLElement->filter("h2")->eq(0)->text();
$price = $productHTMLElement->filter("span")->eq(0)->text();
// instantiate a new product object
$product = [
"url" => $url,
"image" => $image,
"name" => $name,
"price" => $price
];
// add it to the list
$products[] = $product;
});
// log the scraped data
print_r($products);
Run it, and it'll generate this output:
(
[0] => Array
(
[url] => https://scrapeme.live/shop/Bulbasaur/
[image] => https://scrapeme.live/wp-content/uploads/2018/08/001-350x350.png
[name] => Bulbasaur
[price] => £63.00
)
// omitted for brevity...
[15] => Array
(
[url] => https://scrapeme.live/shop/Pidgey/
[image] => https://scrapeme.live/wp-content/uploads/2018/08/016-350x350.png
[name] => Pidgey
[price] => £159.00
)
)
Terrific! The $products
array stores the scraped objects with the desired data. Now you need to export this data to a readable format.
Step 4: Convert Scraped Data Into a CSV File
The PHP standard library provides all you need to create a CSV file and fill it with the scraped data. Use fopen()
to create a products.csv
file and populate it with fputcsv()
. This will convert each product array object to a CSV record and append it to the output file:
// create the output CSV file
$csvFile = fopen("products.csv", "w");
// write the header row
$header = ["url", "image", "name", "price"];
fputcsv($csvFile, $header);
// add each product to the CSV file
foreach ($products as $product) {
fputcsv($csvFile, $product);
}
// close the CSV file
fclose($csvFile);
Put it all together, and you'll get:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once ("vendor/autoload.php");
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// make a GET request to the target site
$crawler = $browser->request("GET", "https://scrapeme.live/shop/");
// where to store the scraped data
$products = [];
// select all product HTML elements on the page,
// iterate over them, and scrape them
$crawler->filter("li.product")->each(function ($productHTMLElement) use (&$products) {
// scraping logic
$url = $productHTMLElement->filter("a")->eq(0)->attr("href");
$image = $productHTMLElement->filter("img")->eq(0)->attr("src");
$name = $productHTMLElement->filter("h2")->eq(0)->text();
$price = $productHTMLElement->filter("span")->eq(0)->text();
// instantiate a new product object
$product = [
"url" => $url,
"image" => $image,
"name" => $name,
"price" => $price
];
// add it to the list
$products[] = $product;
});
// create the output CSV file
$csvFile = fopen("products.csv", "w");
// write the header row
$header = ["url", "image", "name", "price"];
fputcsv($csvFile, $header);
// add each product to the CSV file
foreach ($products as $product) {
fputcsv($csvFile, $product);
}
// close the CSV file
fclose($csvFile);
Execute the script:
php src/scraper.php
After the script execution, a products.csv
file will appear in the project's folder. Open it, and you'll see:

Et voilà! You just built a Goutte PHP web scraping script!
Now, let’s move on to more advanced techniques.
Advanced Web Scraping Techniques with Goutte
Now that you know the basics of web scraping with Goutte, you're ready to learn about handling pagination, scraping dynamic content, and bypassing anti-bot systems. Let’s go!
Handle Pagination
The current script retrieves data from one web page. However, in most use cases, you’ll need more data from your target website.
What if you wanted to retrieve all the products? That's where web crawling comes in!
Web crawling is the process of automatically discovering web pages. Learn more in our guide on web crawling vs web scraping.
To implement web crawling in Goutte, you have to:
- Connect to a web page on the destination site.
- Extract the URLs from the pagination link nodes on the page and add them to an array.
- Repeat the cycle on a new page read from the array.
That loop would only stop when there are no more pages to discover. Since the web scraping Goutte script is just a demo, let's limit the pages to crawl to 5:
// number of pages scraped
$pageCounter = 1;
// maximum number of pages to scrape
$pageLimit = 5;
You already know how to connect to a webpage in Goutte. The next step is to learn how to extract URLs from pagination link elements. Inspect their HTML nodes:

Here, you can see that you can select them all with this CSS selector:
a.page-numbers
Bear in mind that crawling a site isn't as easy as extracting links and following them blindly. You’d risk visiting the same pages multiple times. Avoid that by keeping track of the pages you have already accessed with two extra data structures:
pagesDiscovered
: An array to use a set to store all the URLs discovered during the crawling logic.pagesToScrape
: An array to use a stack to store the URLs of the pages the scraper will visit soon.
Initialize both with the URL of the first product pagination page:
// the first page to visit in the crawling logic
$firstPageToScrape = "https://scrapeme.live/shop/page/1/";
// the Set of pages discovered during the crawling logic
$pagesDiscovered = [$firstPageToScrape];
// the list of remaining pages to scrape
$pagesToScrape = [$firstPageToScrape];
Next, implement the crawling logic as explained earlier with the following while
loop:
while (count($pagesToScrape) != 0 && $pageCounter <= $pageLimit) {
// retrieve the next URL to visit
$pageUrl = array_shift($pagesToScrape);
echo $pageUrl, PHP_EOL;
// connect to the current page
$crawler = $browser->request("GET", $pageUrl);
// crawling logic
$crawler->filter("a.page-numbers")->each(function ($paginationHTMLElement) use (&$pagesDiscovered, &$pagesToScrape) {
// extract the current pagination URL
$newPaginationUrl = $paginationHTMLElement->attr("href");
// if the page discovered is new
if (!in_array($newPaginationUrl, $pagesDiscovered)) {
// if the page discovered needs to be scraped
if (!in_array($newPaginationUrl, $pagesToScrape)) {
$pagesToScrape[] = $newPaginationUrl;
}
$pagesDiscovered[] = $newPaginationUrl;
}
});
// scraping logic...
// increment the iterator counter
$pageCounter++;
}
Extend scraper.php
with the crawling logic above, and you'll have the final code:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once ("vendor/autoload.php");
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// where to store the scraped data
$products = [];
// number of pages scraped
$pageCounter = 1;
// maximum number of pages to scrape
$pageLimit = 5;
// the first page to visit in the crawling logic
$firstPageToScrape = "https://scrapeme.live/shop/page/1/";
// the Set of pages discovered during the crawling logic
$pagesDiscovered = [$firstPageToScrape];
// the list of remaining pages to scrape
$pagesToScrape = [$firstPageToScrape];
// iterate until there are no pages to scrape
// or the limit is hit
while (count($pagesToScrape) != 0 && $pageCounter <= $pageLimit) {
// retrieve the next URL to visit
$pageUrl = array_shift($pagesToScrape);
echo $pageUrl, PHP_EOL;
// connect to the current page
$crawler = $browser->request("GET", $pageUrl);
// crawling logic
$crawler->filter("a.page-numbers")->each(function ($paginationHTMLElement) use (&$pagesDiscovered, &$pagesToScrape) {
// extract the current pagination URL
$newPaginationUrl = $paginationHTMLElement->attr("href");
// if the page discovered is new
if (!in_array($newPaginationUrl, $pagesDiscovered)) {
// if the page discovered needs to be scraped
if (!in_array($newPaginationUrl, $pagesToScrape)) {
$pagesToScrape[] = $newPaginationUrl;
}
$pagesDiscovered[] = $newPaginationUrl;
}
});
// scraping logic
$crawler->filter("li.product")->each(function ($productHTMLElement) use (&$products) {
$url = $productHTMLElement->filter("a")->eq(0)->attr("href");
$image = $productHTMLElement->filter("img")->eq(0)->attr("src");
$name = $productHTMLElement->filter("h2")->eq(0)->text();
$price = $productHTMLElement->filter("span")->eq(0)->text();
// instantiate a new product object
$product = [
"url" => $url,
"image" => $image,
"name" => $name,
"price" => $price
];
// add it to the list
$products[] = $product;
});
// increment the iterator counter
$pageCounter++;
}
// create the output CSV file
$csvFile = fopen("products.csv", "w");
// write the header row
$header = ["url", "image", "name", "price"];
fputcsv($csvFile, $header);
// add each product to the CSV file
foreach ($products as $product) {
fputcsv($csvFile, $product);
}
// close the CSV file
fclose($csvFile);
Launch the web scraping Goutte script:
php src/scraper.php
The scraper will be a bit longer than before because it now has to go through 5 pages. This time, the products.csv
file generated by the script will contain more records:

Congrats! You’ve just learned how to perform web crawling and web scraping in Goutte!
Avoid Getting Blocked When Scraping With Goutte
With data being companies’ most valuable assets, more and more sites adopt anti-bot measures. These technologies can detect and block automated scripts, such as your Goutte scraper.
There are a few tips and tricks to perform web scraping without getting blocked. However, bypassing all anti-bot systems isn't easy. Complete solutions such as Cloudflare will still be able to block your script.
Assume you want to scrape the protected G2 Reviews page used before.
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once("vendor/autoload.php");
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// make a GET request to the target site
$crawler = $browser->request("GET", "https://www.g2.com/products/zapier/reviews");
// get the response returned by the server
$response = $browser->getResponse();
// extract the HTML content and print it
$html = $response->getContent();
echo $html;
The Goutte web scraping script above will print the following 403 Forbidden
error page:
<!doctype html>
<html class="no-js" lang="en-US">
<head>
<title>Attention Required! | Cloudflare</title>
<meta charset="UTF-8">
<!-- omitted for brevity... -->
The most efficient solution to this problem is choosing a web scraping API, such as ZenRows. It provides a top-notch anti-bot toolkit to bypass any block. Other useful features of the tool are IP and User-Agent rotation or anti-CAPTCHAs.
Use ZenRows in Goutte for maximum effectiveness. Sign up for free to get your first 1,000 credits, and then reach the Request Builder page:

Assume you want to scrape the Cloudflare-protected G2.com pageused before. Follow these steps:
- Paste the target URL (
https://www.g2.com/products/zapier/reviews
) into the "URL to Scrape" input. - Click on "Premium Proxy" to enable IP rotation.
- Enable the "JS Rendering" feature (by default, User-Agent rotation and the AI-powered anti-bot toolkit are included).
- Select the “cURL” option on the right and then the “API” mode to get the full URL of the ZenRows API.
Pass the generated URL as an argument in the request()
method:
<?php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
require_once("vendor/autoload.php");
// initialize an HTTP client object
$client = new HttpClient();
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// make a GET request to the target site
// through ZenRows
$crawler = $browser->request("GET", "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fzapier%2Freviews&js_render=true&premium_proxy=true");
// get the response returned by the server
$response = $browser->getResponse();
// extract the HTML content and print it
$html = $response->getContent();
echo $html;
Execute your scraping script again. This time, it'll print the source HTML code of the G2 page:
<!DOCTYPE html>
<head>
<meta charset="utf-8" />
<link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
<title>Zapier Reviews 2024: Details, Pricing, & Features | G2</title>
<!-- omitted for brevity ... -->
Scrape Dynamic Content With Goutte
As mentioned before, Goutte provides a browser-like API. For example, selecting a link element gives you access to the click()
method:
// select the link that contains the "Bulbasaur" text
// and click on it
$link = $crawler->selectLink("Bulbasaur")->link();
$crawler = $client->click($link);
Similarly, you can fill out and submit a form with this syntax:
// select the "Sign-in" form
$form = $crawler->selectButton("Sign in")->form();
// fill out the form and submit it
$crawler = $client->submit($form, ["login" => "<YOUR_USERNAME>", "password" => "<YOUR_PASSWORD>"]);
You’ve just learned how to use ZenRows for web scraping in Goutte. Bye-bye, 403 errors.
As you can see, writing Goutte web scraping logic is intuitive. At the same time, Goutte doesn't execute those instructions in a browser. This is just an API convention used to simplify coding.
The consequence is that Goutte can only scrape static HTML pages. To retrieve data from dynamic pages that need JavaScript execution, you have to use a browser automation tool, the most popular of which is Selenium.
Learn how to use it in our guide on web scraping with Selenium PHP.
Conclusion
This tutorial guided you through the process of web scraping using the Goutte PHP library. After learning both the fundamentals and the more advanced tricks, you've become a Goutte web scraping expert!
Goutte is a useful library for web scraping in PHP, especially for static sites. Its browser-like API makes it easy to retrieve data from a page. However, anti-scraping solutions that can block your script can still pose a huge challenge. The solution is ZenRows, a scraping API with the most effective anti-bot bypass capabilities. Extracting online data from any web page has never been easier!