Playwright in Golang for Web Scraping [Tutorial 2024]

July 2, 2024 · 9 min read

Playwright is one of the most comprehensive browser automation libraries available. That's why the community decided to port it to Go. The Playwright Golang library is now a favorite tool for testing and web scraping.

In this guide, you'll see the basics of Playwright with Go and then explore more complex interactions. You'll learn:

Let's dive in!

Why Use Playwright in Golang

Developed by Microsoft, Playwright is one of the most powerful browser automation libraries. Its comprehensive API makes it ideal for both testing and web scraping tasks. Developers use it to simulate user interactions in different languages and browsers.

The community-driven playwright-go package represents a Playwright Golang port. In other words, it brings Playwright to the Go ecosystem. While the library isn't supported by Microsoft, it receives a lot of updates and it's always up-to-date.

Before diving into this tutorial, consider checking out our guides on headless browser scraping and web scraping with Go.

How to Use Playwright in Golang

Get started with the Golang Playwright library by scraping this infinite scrolling demo:

infinite scrolling demo page
Click to open the image in full screen

A page that loads new products as the user scrolls down is a perfect example of a dynamic content page. However, without a tool that can run JavaScript, you couldn't interact with it, making it a great target for testing the Playwright headless browser capabilities.

Time to extract some data from it!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Step 1: Install Playwright-Go

Before getting started, make sure you have Go installed on your machine. Download the Golang installer, execute it, and follow the instructions.

You now have everything you need to initialize a Playwright Golang project. Use the commands below to create a playwright-project folder and enter it in the terminal:

Terminal
mkdir playwright-project
cd playwright-project

Set up a Go module called playwright-scraper in the project folder with the init command:

Terminal
go mod init playwright-scraper

Your playwright-project folder will contain a go.mod file.

Install Playwright in Go by adding the playwright-go package to your project's dependencies:

Terminal
go get -u github.com/playwright-community/playwright-go

Playwright requires the browser executable and some extra dependencies to work properly. Retrieve them all with the following command:

Terminal
go run github.com/playwright-community/playwright-go/cmd/playwright@latest install --with-deps

Awesome! You're now ready to set up a Playwright script in Go.

Open the project folder in your favorite Golang IDE. Visual Studio Code with the Go extension will do. Add a scraper.go script into your project folder and initialize it with the code below. The first line contains the name of the project package, and then there's the Playwright import:

scraper.go
package main

import (
    "fmt"
    "log"

    "github.com/playwright-community/playwright-go"
)

func main() {
    // scraping logic...
}

main() is the entry function of your Go program and will soon contain the scraping logic.

You can run the Playwright Go script with this command:

Terminal
go run scraper.go

Great! Turn the Golang Playwright script into an automated scraper in the next steps.

Step 2: Scrape With Playwright-Go

Use the lines below in main() to initialize Playwright, launch a Chromium window, and open a new page:

scraper.go
// initialize a Playwright instance to
// perform browser automation
pw, err := playwright.Run()
if err != nil {
    log.Fatalf("Could not start Playwright: %v", err)
}

// initialize a Chromium instance
browser, err := pw.Chromium.Launch(
    playwright.BrowserTypeLaunchOptions{
        Headless: playwright.Bool(true), // set to false in development
    },
)
if err != nil {
    log.Fatalf("Could not launch the browser: %v", err)
}

// open a new page within the current browser context
page, err := browser.NewPage()
if err != nil {
    log.Fatalf("Could not open a new page: %v", err)
}

Use the Goto() method to open the target page in the controlled Chromium instance:

scraper.go
if _, err = page.Goto("https://scrapingclub.com/exercise/list_infinite_scroll/"); err != nil {
    log.Fatalf("Could not visit the desired page: %v", err)
}

Next, call the Content() method to get the source HTML code of the page. Log it in the terminal with fmt.Println():

scraper.go
html, err := page.Content()
if err != nil {
   log.Fatalf("Could retrieve the HTML of the page: %v", err)
}
fmt.Println(html)

Your Playwright Golang script scraper.go should now contain:

scraper.go
package main

import (
    "fmt"
    "log"

    "github.com/playwright-community/playwright-go"
)

func main() {
    // initialize a Playwright instance to
    // perform browser automation
    pw, err := playwright.Run()
    if err != nil {
        log.Fatalf("Could not start Playwright: %v", err)
    }

    // initialize a Chromium instance
    browser, err := pw.Chromium.Launch(
      playwright.BrowserTypeLaunchOptions{
          Headless: playwright.Bool(true), // set to false in development
      },
  )
    if err != nil {
        log.Fatalf("Could not launch the browser: %v", err)
    }

    // open a new page within the current browser context
    page, err := browser.NewPage()
    if err != nil {
        log.Fatalf("Could not open a new page: %v", err)
    }

    // visit the target page
    if _, err = page.Goto("https://scrapingclub.com/exercise/list_infinite_scroll/"); err != nil {
        log.Fatalf("Could not visit the desired page: %v", err)
    }

    // retrieve the source HTML code of the page
    // and log it
    html, err := page.Content()
    if err != nil {
        log.Fatalf("Could retrieve the HTML of the page: %v", err)
    }
    fmt.Println(html)
}

Enable the headed mode by setting Headless to false and then run the script. Playwright will open a Chromium window and load the Infinite Scrolling page as below:

Infinite Scroll Demo
Click to open the image in full screen

Before terminating, the script will print the following HTML content in the terminal:

Output
<html class="h-full"><head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="description" content="Learn to scrape infinite scrolling pages"><title>Scraping Infinite Scrolling Pages (Ajax) | ScrapingClub</title>
  <link rel="icon" href="/static/img/icon.611132651e39.png" type="image/png">
  <!-- Omitted for brevity... -->

Amazing, that's the HTML code of the target page!

See how to use Playwright in Golang to scrape data from that page in the next step.

Step 3: Extract the Data You Want

The Golang Playwright library comes with the ability to parse HTML code. This enables you to select DOM elements and extract data from them, which is what web scraping is all about!

Suppose your scraping goal is to retrieve the name and price of each product on the page. Achieve that with the following 3-step procedure.

  1. Get the product HTML nodes on the page using a proper DOM selection strategy.
  2. Extract the information of interest from each of them.
  3. Store the scraped data in a Golang data structure.

In most cases, a DOM selection strategy is nothing more than a CSS Selector or an XPath expression. When it comes to web scraping, those are the two most popular approaches to selecting HTML nodes on a page.

CSS selectors are concise and simple, while XPath expressions are longer but also more powerful. Dig into the comparison in our piece on CSS Selector vs XPath.

Let's opt for CSS selectors to keep things simple!

You have to study the HTML code of a page to devise an effective CSS selector strategy for its nodes. So, open the target page in your browser and inspect a product HTML element with the DevTools:

Inspect Element
Click to open the image in full screen

Expand the HTML snippet and see that you can select all product cards on the page with this CSS selector:

Example
.post

Each product HTML node consists of:

  • An

    element containing the product name.

  • An
    element with product price.

You're ready to implement the Golang Playwright scraping logic. Before jumping into that, define a struct for storing the data to extract from the product nodes:

scraper.go
type Product struct {
    name, price string
}

In the main() function, initialize an empty array of type Product. At the end of the script execution, this array will contain all the scraped data:

scraper.go
var products []Product

Use the Locator() method to apply a CSS selector and get all product HTML nodes on the page:

scraper.go
productHTMLElements, err := page.Locator(".post").All()
if err != nil {
    log.Fatalf("Could not get the product node: %v", err)
}

Iterate over the product elements, apply the data extraction logic, and populate products:

scraper.go
for _, productHTMLElement := range productHTMLElements {
    // select the name and price nodes
    // and extract the data of interest from them
    name, err := productHTMLElement.Locator("h4").First().TextContent()
    price, err := productHTMLElement.Locator("h5").First().TextContent()
    if err != nil {
        log.Fatal("Could not apply the scraping logic:", err)
    }

    // add the scraped data to the list
    product := Product{}
    product.name = strings.TrimSpace(name)
    product.price = strings.TrimSpace(price)
    products = append(products, product)
}

The TextContent() method returns the raw text data inside the current element. That may include whitespace characters and newlines. Remove them with the strings.TrimSpace() function.

Import strings with the following line on top of your scraper.go filer:

scraper.go
import (
    // other imports...
    "strings"
)

Verify that the products array stores the expected data by printing it in the terminal:

scraper.go
fmt.Println(products)

Integrate the above snippets in your Playwright Golang script, and you'll get:

scraper.go
package main

import (
    "fmt"
    "log"
    "strings"

    "github.com/playwright-community/playwright-go"
)

// a custom struct matching the data to scrape from
// each product node
type Product struct {
    name, price string
}

func main() {
    // initialize a Playwright instance to
    // perform browser automation
    pw, err := playwright.Run()
    if err != nil {
        log.Fatalf("Could not start Playwright: %v", err)
    }

    // initialize a Chromium instance
    browser, err := pw.Chromium.Launch(
        playwright.BrowserTypeLaunchOptions{
            Headless: playwright.Bool(true), // set to false in development
        },
    )
    if err != nil {
        log.Fatalf("Could not launch the browser: %v", err)
    }

    // open a new page within the current browser context
    page, err := browser.NewPage()
    if err != nil {
        log.Fatalf("Could not open a new page: %v", err)
    }

    // visit the target page
    if _, err = page.Goto("https://scrapingclub.com/exercise/list_infinite_scroll/"); err != nil {
        log.Fatalf("Could not visit the desired page: %v", err)
    }

    // where to store the scraped data
    var products []Product

    // select the product elements
    productHTMLElements, err := page.Locator(".post").All()
    if err != nil {
        log.Fatalf("Could not get the product node: %v", err)
    }

    // iterate over the product nodes
    // and apply the scraping logic
    for _, productHTMLElement := range productHTMLElements {
        // select the name and price nodes
        // and extract the data of interest from them
        name, err := productHTMLElement.Locator("h4").First().TextContent()
        price, err := productHTMLElement.Locator("h5").First().TextContent()
        if err != nil {
            log.Fatal("Could not apply the scraping logic:", err)
        }

        // add the scraped data to the list
        product := Product{}
        product.name = strings.TrimSpace(name)
        product.price = strings.TrimSpace(price)
        products = append(products, product)
    }

    // log the scraped data
    fmt.Println(products)
}

Execute it, and it'll produce the following output:

Output
[{Short Dress $24.99} {Patterned Slacks $29.99} {Short Chiffon Dress $49.99} {Off-the-shoulder Dress $59.99} {V-neck Top $24.99} {Short Chiffon Dress $49.99} {V-neck Top $24.99} {V-neck Top $24.99} {Short Lace Dress $59.99} {Fitted Dress $34.99}]

Wow! The parsing logic works as intended. All that remains is to export the scraped data in a human-readable format.

Step 4: Export Data as CSV

Populate an output CSV file with the collected data with the following code. Create a products.csv file and initialize it with the header row. Then, iterate over products, convert each struct instance to a string array, and add it as a new record to the CSV file:

scraper.go
// open the CSV file 
file, err := os.Create("products.csv")
if err != nil {
    log.Fatal("Could not open the CSV output file:", err)
}
defer file.Close()

// initialize a CSV file writer
writer := csv.NewWriter(file)

// define the CSV header row
// and write it to the file
headers := []string{
    "name",
    "price",
}
writer.Write(headers)

// add each product to the CSV output file
for _, product := range products {
    // convert a Product to an array of strings
    record := []string{
        product.name,
        product.price,
    }

    // write a new CSV record
    writer.Write(record)
}
defer writer.Flush()

Add the following imports to make the scraper.go script work:

scraper.go
import ( 
    "encoding/csv" 
    "log" 
    "os" 
    // other imports... 
)

Put it all together, and you'll get this scraping script:

scraper.go
package main

import (
    "encoding/csv"
    "log"
    "os"
    "strings"
  
    "github.com/playwright-community/playwright-go"
)

// a custom struct matching the data to scrape from
// each product node
type Product struct {
    name, price string
}

func main() {
    // initialize a Playwright instance to
    // perform browser automation
    pw, err := playwright.Run()
    if err != nil {
        log.Fatalf("Could not start Playwright: %v", err)
    }

    // initialize a Chromium instance
    browser, err := pw.Chromium.Launch(
        playwright.BrowserTypeLaunchOptions{
            Headless: playwright.Bool(true), // set to false in development
        },
    )
    if err != nil {
            log.Fatalf("Could not launch the browser: %v", err)
    }

    // open a new page within the current browser context
    page, err := browser.NewPage()
    if err != nil {
        log.Fatalf("Could not open a new page: %v", err)
    }

    // visit the target page
    if _, err = page.Goto("https://scrapingclub.com/exercise/list_infinite_scroll/"); err != nil {
        log.Fatalf("Could not visit the desired page: %v", err)
    }

    // where to store the scraped data
    var products []Product

    // select the product elements
    productHTMLElements, err := page.Locator(".post").All()
    if err != nil {
        log.Fatalf("Could not get the product node: %v", err)
    }

    // iterate over the product nodes
    // and apply the scraping logic
    for _, productHTMLElement := range productHTMLElements {
        // select the name and price nodes
        // and extract the data of interest from them
        name, err := productHTMLElement.Locator("h4").First().TextContent()
        price, err := productHTMLElement.Locator("h5").First().TextContent()
        if err != nil {
            log.Fatal("Could not apply the scraping logic:", err)
        }
  
        // add the scraped data to the list
        product := Product{}
        product.name = strings.TrimSpace(name)
        product.price = strings.TrimSpace(price)
        products = append(products, product)
    }

    // open the CSV file
    file, err := os.Create("products.csv")
    if err != nil {
        log.Fatal("Could not open the CSV output file:", err)
    }
    defer file.Close()

    // initialize a CSV file writer
    writer := csv.NewWriter(file)

    // define the CSV header row
    // and write it to the file
    headers := []string{
        "name",
        "price",
    }
    writer.Write(headers)

    // add each product to the CSV output file
    for _, product := range products {
        // convert a Product to an array of strings
        record := []string{
            product.name,
            product.price,
        }

        // write a new CSV record
        writer.Write(record)
    }
    defer writer.Flush()
}

Launch the Golang Playwright scraper:

scraper.go
go run scraper.go

Wait for the execution to end and a products.csv file will appear in your project's folder. Open it and you'll see the data below:

Extracted Data in CSV  File
Click to open the image in full screen

Fantastic! You now know the basics of Playwright in Go.

At the same time, note that the current output only contains ten records. That's due to the infinite scrolling approach used by the target page to load new products. Learn how to simulate that interaction and scrape all products in the next chapter!

Interact With a Browser With Playwright-Go

The Playwright Golang package can mimic many user interactions, including clicks, waits, and more. These actions help your automated script interact with web pages as a human user would. That may even fool the anti-bot systems into believing that your script is a regular visitor.

The interactions supported by Playwright in Go include:

  • Click on elements.
  • Hover web nodes and perform other mouse movements, including drag-and-drop operations.
  • Wait for elements on the page to be visible, clickable, etc.
  • Fill out input fields and submit forms.
  • Scroll up and down the page.
  • Take screenshots.

Most of those operations are available via built-in methods from the Playwright API. Otherwise, you can use the Evaluate() method to run a JavaScript script directly on the page. Together, these two approaches allow you to simulate any user interaction.

Scrape all products from the infinite scroll demo and then explore other interactions!

Scrolling

The page initially has only ten product cards and relies on infinite scrolling to load more. If you want to extract data from all products available on that webpage, you have to:

  1. Simulate several scrolls to trigger the dynamic loading of new products.
  2. Wait for the product elements to be loaded on the page and added to the DOM.
  3. Apply the scraping logic to each of them.

Keep in mind that Playwright doesn't offer a built-in method to scroll down the page. So, you need to define custom JavaScript logic as in the snippet below. This instructs the browser to scroll down the page 10 times at an interval of 500 ms:

scraper.go
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === scrolls) {
    clearInterval(scrollInterval)
  }
}, 500)

Simulate infinite scrolling by passing the above script to the Evaluate() method:

scraper.go
scrollingScript := `
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
    window.scrollTo(0, document.body.scrollHeight)
    scrollCount++

    if (scrollCount === scrolls) {
    clearInterval(scrollInterval)
    }
}, 500)
`
// execute the custom JavaScript script on the page
_, err = page.Evaluate(scrollingScript, []interface{}{})
if err != nil {
        log.Fatal("Could not perform the JS scrolling logic:", err)
}

You want the browser to load new products before selecting them in your script. Thus, place the above instructions before the locator logic.

You now have to wait for the scrolling logic to be executed and the new products to be on the page. Use WaitForTimeout() to stop the script for 10 seconds after the JS script execution:

scraper.go
page.WaitForTimeout(10000)

Your scraper.go file should look like this:

File
package main

import (
    "encoding/csv"
    "log"
    "os"
    "strings"

    "github.com/playwright-community/playwright-go"
)

// a custom struct matching the data to scrape from
// each product node
type Product struct {
    name, price string
}

func main() {
    // initialize a Playwright instance to
    // perform browser automation
    pw, err := playwright.Run()
    if err != nil {
        log.Fatalf("Could not start Playwright: %v", err)
    }

    // initialize a Chromium instance
    browser, err := pw.Chromium.Launch(
        playwright.BrowserTypeLaunchOptions{
            Headless: playwright.Bool(true), // set to false in development
        },
    )
    if err != nil {
        log.Fatalf("Could not launch the browser: %v", err)
    }

    // open a new page within the current browser context
    page, err := browser.NewPage()
    if err != nil {
        log.Fatalf("Could not open a new page: %v", err)
    }

    // visit the target page
    if _, err = page.Goto("https://scrapingclub.com/exercise/list_infinite_scroll/"); err != nil {
        log.Fatalf("Could not visit the desired page: %v", err)
    }

    // where to store the scraped data
    var products []Product

    // JavaScript scrolling script
    scrollingScript := `
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
        window.scrollTo(0, document.body.scrollHeight)
        scrollCount++

        if (scrollCount === scrolls) {
        clearInterval(scrollInterval)
        }
    }, 500)
`
    // execute the custom JavaScript script on the page
    _, err = page.Evaluate(scrollingScript, []interface{}{})
    if err != nil {
        log.Fatal("Could not perform the JS scrolling logic:", err)
    }

    // wait for the products to be on the page
    page.WaitForTimeout(10000)

    // select the product elements
    productHTMLElements, err := page.Locator(".post").All()
    if err != nil {
        log.Fatalf("Could not get the product node: %v", err)
    }

    // iterate over the product nodes
    // and apply the scraping logic
    for _, productHTMLElement := range productHTMLElements {
        // select the name and price nodes
        // and extract the data of interest from them
        name, err := productHTMLElement.Locator("h4").First().TextContent()
        price, err := productHTMLElement.Locator("h5").First().TextContent()
        if err != nil {
            log.Fatal("Could not apply the scraping logic:", err)
        }

        // add the scraped data to the list
        product := Product{}
        product.name = strings.TrimSpace(name)
        product.price = strings.TrimSpace(price)
        products = append(products, product)
    }

    // open the CSV file
    file, err := os.Create("products.csv")
    if err != nil {
        log.Fatal("Could not open the CSV output file:", err)
    }
    defer file.Close()

    // initialize a CSV file writer
    writer := csv.NewWriter(file)

    // define the CSV header row
    // and write it to the file
    headers := []string{
        "name",
        "price",
    }
    writer.Write(headers)

    // add each product to the CSV output file
    for _, product := range products {
        // convert a Product to an array of strings
        record := []string{
            product.name,
            product.price,
        }

        // write a new CSV record
        writer.Write(record)
    }
    defer writer.Flush()
}

Run your Playwright Golang script again:

Terminal
go run scraper.go

The 10-second interruption will make the scraper much slower, so be patient.

Verify that products contain more than 10 records by looking at the products.csv output file:

Updated CSV File
Click to open the image in full screen

Incredible! You just scraped all products on the target page.

Wait for Element

The current script achieves the scraping goal, but it relies on WaitForTimeout(). That method is deprecated because it leads to flaky behavior in your browser automation logic.

The reason? Consider what would happen in the event of a browser or network slowdown. In that scenario, 10 seconds might not be enough time to load all the products, leading to an output that doesn't contain all the expected records.

Plus, forcing your script to idle for a fixed number of seconds makes it unnecessarily slow. These are some compelling reasons to never use hard waits in automation scripts. That's why Playwright offers many methods for waiting for specific conditions to occur.

Use the ToBeVisible() assertion with a timeout of 10 seconds to wait for the 60th .post element to be on the DOM:

scraper.go
product := page.Locator(".post:nth-child(60)")
playwright.NewPlaywrightAssertions(10000).Locator(product).ToBeVisible()

Replace the WaitForTimeout() instruction with the line of code above. Your browser automation script will now wait up to 10 seconds for the page to render all the 60 products retrieved via AJAX after the scrolls.

The definitive code of your Playwright Golang script will be:

scraper.go
package main

import (
    "encoding/csv"
    "log"
    "os"
    "strings"

    "github.com/playwright-community/playwright-go"
)

// a custom struct matching the data to scrape from
// each product node
type Product struct {
    name, price string
}

func main() {
    // initialize a Playwright instance to
    // perform browser automation
    pw, err := playwright.Run()
    if err != nil {
        log.Fatalf("Could not start Playwright: %v", err)
    }

    // initialize a Chromium instance
    browser, err := pw.Chromium.Launch(
        playwright.BrowserTypeLaunchOptions{
            Headless: playwright.Bool(true), // set to false in development
        },
    )
    if err != nil {
        log.Fatalf("Could not launch the browser: %v", err)
    }

    // open a new page within the current browser context
    page, err := browser.NewPage()
    if err != nil {
        log.Fatalf("Could not open a new page: %v", err)
    }

    // visit the target page
    if _, err = page.Goto("https://scrapingclub.com/exercise/list_infinite_scroll/"); err != nil {
        log.Fatalf("Could not visit the desired page: %v", err)
    }

    // where to store the scraped data
    var products []Product

    // JavaScript scrolling script
    scrollingScript := `
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
        window.scrollTo(0, document.body.scrollHeight)
        scrollCount++

        if (scrollCount === scrolls) {
        clearInterval(scrollInterval)
        }
    }, 500)
`
    // execute the custom JavaScript script on the page
    _, err = page.Evaluate(scrollingScript, []interface{}{})
    if err != nil {
        log.Fatal("Could not perform the JS scrolling logic:", err)
    }

    // wait for the 60th product to be on the page
    product := page.Locator(".post:nth-child(60)")
    playwright.NewPlaywrightAssertions(10000).Locator(product).ToBeVisible()

    // select the product elements
    productHTMLElements, err := page.Locator(".post").All()
    if err != nil {
        log.Fatalf("Could not get the product node: %v", err)
    }

    // iterate over the product nodes
    // and apply the scraping logic
    for _, productHTMLElement := range productHTMLElements {
        // select the name and price nodes
        // and extract the data of interest from them
        name, err := productHTMLElement.Locator("h4").First().TextContent()
        price, err := productHTMLElement.Locator("h5").First().TextContent()
        if err != nil {
            log.Fatal("Could not apply the scraping logic:", err)
        }

        // add the scraped data to the list
        product := Product{}
        product.name = strings.TrimSpace(name)
        product.price = strings.TrimSpace(price)
        products = append(products, product)
    }

    // open the CSV file
    file, err := os.Create("products.csv")
    if err != nil {
            log.Fatal("Could not open the CSV output file:", err)
    }
    defer file.Close()

    // initialize a CSV file writer
    writer := csv.NewWriter(file)

    // define the CSV header row
    // and write it to the file
    headers := []string{
        "name",
        "price",
    }
    writer.Write(headers)

    // add each product to the CSV output file
    for _, product := range products {
        // convert a Product to an array of strings
        record := []string{
            product.name,
            product.price,
        }

        // write a new CSV record
        writer.Write(record)
    }
    defer writer.Flush()
}

Launch scraper.go another time. You'll get the same results as before but much faster since the script no longer has to wait 10 seconds.

Et voilà! You're now an expert in web scraping using Playwright in Go. Learn more by exploring other possible interactions.

Wait for Page to Load

Goto() automatically waits for the browser to fire the load event for the destination web page. Nevertheless, modern sites are more dynamic than ever before. Understanding when a page has fully loaded may not be easy, especially when it makes a lot of AJAX requests.

For more complex scenarios, you can use the following auto-waiting assertions:

  • ToBeAttached(): Ensures that the element pointed to by the locator is attached to the DOM.
  • ToBeChecked(): Ensures that the checkbox element pointed to by the locator is checked.
  • ToBeDisabled(): Ensures that the element pointed to by the locator is disabled.
  • ToBeEditable(): Ensures that the element pointed to by the locator is editable.
  • ToBeEmpty(): Ensures that the container pointed to by the locator is empty.
  • ToBeEnabled(): Ensures that the element pointed to by the locator is enabled.
  • ToBeFocused(): Ensures that the element pointed to by the locator is focused.
  • ToBeHidden(): Ensures that the element pointed to by the locator is not visible.
  • ToBeInViewport(): Ensures that the element pointed to by the locator intersects the viewport.
  • ToBeVisible(): Ensures that the element pointed to by the locator is visible.
  • ToContainText(): Ensures that the element pointed to by the locator contains the given text.
  • ToHaveAttribute(): Ensures that the element pointed to by the locator has the specified attribute.
  • ToHaveClass(): Ensures that the element pointed to by the locator has the specified CSS classes.
  • ToHaveCount(): Ensures that the element pointed to by the locator resolves to an exact number of DOM nodes.
  • ToHaveCSS(): Ensures that the element pointed to by the locator has the specified computed CSS style.
  • ToHaveId(): Ensures that the element pointed to by the locator has the specified ID attribute.
  • ToHaveJSProperty(): Ensures that the element pointed to by the locator has the specified JavaScript property.
  • ToHaveText(): Ensures that the element pointed to by the locator has the specified text.
  • ToHaveValue(): Ensures that the input element pointed to by the locator has the specified value.
  • ToHaveValues(): Ensures that the multi-select/combobox pointed to by the locator has the specified options selected.

For more information on how to wait in Playwright Golang, refer to the documentation.

Click Elements

Golang Playwright locators expose the Click() method to help you simulate click interactions. This function tells the browser to click on the specified element as a human user would:

scraper.go
locator.Click()

If the click triggers a page change (as in the snippet below), you'll get redirected. In this case, you'll have to adapt the DOM node selection logic to the new page:

scraper.go
productHTMLElement := page.Locator(".post")
productHTMLElement.Click()
// you are now on the detail product page...
    
// new scraping logic...
// page.Locator(...)

Avoid Getting Blocked When Scraping With Playwright-Go

Getting stopped by anti-bot systems is the biggest challenge to web scraping. These anti-scraping measures identify and block scrapers such as your Playwright Go script. In particular, they protect the valuable data exposed by a site from automated scripts.

While it's not easy, performing web scraping without getting blocked is surely possible. A popular approach to avoid the most simple anti-bots is to randomize your requests. The idea is to use proxies to change the output IP and set real User Agent header values.

Pass a custom User Agent to NewContext() for setting it globally. Then, use the customized browser content to create a new page:

scraper.go
// create a new browser context with
// a custom user agent 
customUserAgent := "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
context, err := browser.NewContext(
    playwright.BrowserNewContextOptions{
        UserAgent: &customUserAgent,
    },
)
if err != nil {
    log.Fatalf("Could create a new browser context: %v", err)
}

// initialize a new page
page, err := context.NewPage()
if err != nil {
    log.Fatalf("Could not open a new page: %v", err)
}

Find out why this is useful in our article on User Agents for web scraping.

To configure an HTTP proxy in Playwright, you have to pass a Proxy object to the Launch() method. First, retrieve the connection info to a free proxy server from sites like Free Proxy List. Next, use the connection URL to instantiate a Proxy object and set it in the browser:

scraper.go
// create a Proxy object with the
// proxy connection URL
proxy := playwright.Proxy {
    Server: "http://211.32.24.28:9083",
}
// initialize a Chromium instance
// with the specified proxy
browser, err := pw.Chromium.Launch(
    playwright.BrowserTypeLaunchOptions{
        Proxy: &proxy,
        // other configs...
    },
)

All requests made by the browser to load the page and its data will now be routed through the specified proxy. However, that may not be about.

The real issue is that free proxies are unreliable, data-greedy, and short-lived. For example, by the time you follow this tutorial, the proxy server above will no longer work. In production, never rely on free proxies.

The two tips above may work for simple anti-bots but they're just baby steps to bypass complex technologies. Sophisticated tools like Cloudflare will still be able to block your script. Verify that by targeting with your Playwright Golang script a Cloudflare-protected site like G2:

Blocked G2 Page
Click to open the image in full screen

Should you give up your scraping dreams? Absolutely not! You just have to use the right tool, and its name is ZenRows. This next-generation scraping API comes with the most powerful bot bypass toolkit, along with User Agent and IP rotation capabilities.

Experience the power of ZenRows. Sign up for free, get your first 1,000 credits, and reach the Request Builder page:

ZenRows Request Builder
Click to open the image in full screen

Suppose you need to scrape data from the Cloudflare-protected G2.com page seen earlier. Get the code to achieve that with the following procedure:

  1. Paste the target URL (i.e., https://www.g2.com/products/airtable/reviews) into the "URL to Scrape" input.
  2. Enable the “JS Rendering” mode.
  3. Click on "Premium Proxy" to enable IP rotation (User Agent rotation and the AI-powered anti-bot toolkit are always included by default).
  4. Select the “Go” option on the right and then the “API” mode to get the snippet required to use ZenRows in Golang.

This is the code ZenRows will generate for you:

Example
package main

import (
    "io"
    "log"
    "net/http"
)

func main() {
    client := &http.Client{}
    req, err := http.NewRequest("GET", "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fairtable%2Freviews&js_render=true&premium_proxy=true", nil)
    resp, err := client.Do(req)
    if err != nil {
        log.Fatalln(err)
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        log.Fatalln(err)
    }

    log.Println(string(body))

Run the script and it'll print the HTML of the target page protected by Cloudflare:

Output
<!DOCTYPE html>
<head>
  <meta charset="utf-8" />
  <link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
  <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>
  <!-- omitted for brevity ... -->

Wow! Bye-bye CAPTCHAS and error pages. You just integrated ZenRows into Golang Playwright.

Conclusion

In this tutorial, you learned the basics of browser automation in Playwright with Go. You started by controlling headless Chromium and then explored more advanced techniques. You've become a Playwright Golang expert!

You now know:

  • How to set up a Go project for Playwright.
  • How to use it to extract data from a dynamic content page.
  • What user interactions Playwright support in Go.
  • The challenges of web scraping and how to deal with them.

No matter how complex your browser automation is, anti-bot systems can still block you. Bypass them all with ZenRows, the next-generation web scraping API with browser automation capabilities, IP rotation, and the most powerful anti-scraping toolkit. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you