GoSpider is a command-line web crawling framework known for its speed. It offers numerous features, all embedded in an intuitive interface that makes it easy to collect data from basic targets that do not require complex crawling logic.
This tutorial will walk you through crawling websites, following links, and scraping valuable data using GoSpider. By the end, you'll be able to discover page URLs and extract data from these links.
Build Your First GoSpider Web Crawler
Real-world examples are the best learning tools. In this tutorial, we'll crawl the ScrapingCourse E-commerce Test site.

We'll find and follow product links on the website and also scrape valuable information (product name, price, and image URL) as the crawler navigates each product page.
Step 1: Set up GoSpider
To follow along in this tutorial, ensure you meet the following requirements.
The steps below will help you set up your Go project.
Run the command below to verify your Go installation.
go version
If you have Go installed, this command will return the version, as seen below.
go version go1.23.4 windows/386
Next, navigate to a directory where you'd like to store your code and run the GoSpider installation command.
GO111MODULE=on go install github.com/jaeles-project/gospider@latest
On Windows, this command fetches the latest version of the tool's GitHub repository and places the executable in your $GOPATH/bin
directory.
That's it. You're all set up and ready to start crawling with GoSpider.
But before you dive in, it's important to familiarize yourself with the tool's functionality and options. You can use GoSpider's help menu to get started. To access this menu, run the following command.
gospider -h
This provides an overview of GoSpider's options and their usage.
Flags:
-s, --site string Site to crawl
-S, --sites string Site list to crawl
-p, --proxy string Proxy (Ex: http://127.0.0.1:8080)
-o, --output string Output folder
-u, --user-agent string User Agent to use
web: random web user-agent
mobi: random mobile user-agent
or you can set your special user-agent (default "web")
--cookie string Cookie to use (testA=a; testB=b)
-H, --header stringArray Header to use (Use multiple flag to set multiple header)
# ... truncated for brevity ... #
Step 2: Access the Target Website
Let's start with a basic GoSpider command to access the target website and crawl all available links on the page.
With just a few parameters, you can instruct GoSpider to find links and save those results in a text file.
Here's what the command looks like (your "hello world" moment with GoSpider):
gospider -q -s "https://www.scrapingcourse.com/ecommerce/" -o output
This command fires GoSpider to begin crawling, listing all the links it finds on the target page. Here's what each flag means:
-q
(quiet): suppresses verbose output and only crawls URLs.-s
(site): takes a string as an argument, specifying the URL to crawl.-o
(output): tells the crawler to store the result in a folder namedoutput
. The text files are often stored using the target website's domain name,www_scrapingcource_com
.
Once you run this command, GoSpider starts processing, and you'll see results similar to the one below.
https://www.scrapingcourse.com/ecommerce/
https://www.scrapingcourse.com
https://www.scrapingcourse.com/ecommerce/feed/
https://www.scrapingcourse.com/ecommerce/comments/feed/
https://www.scrapingcourse.com/ecommerce/shop/feed/
https://www.scrapingcourse.com/ecommerce/cart/
# ... truncated for brevity ... #
Step 3: Follow Links With GoSpider
Now, let's scale our crawler to find and follow specific links. For this tutorial, we'll keep things simple and focus only on pagination links.
GoSpider provides various command flags that allow you to configure your crawler according to your needs. Using the initial "basic" command, GoSpider will only find all the links on the target page but won't follow them.
However, if you open the target page in a browser, you'll notice that not all pagination elements are immediately visible in the displayed HTML.

Pages 5, 6, 7, 8, and 9 are missing, so our previous command couldn't find these links. You'll need to crawl across multiple levels to locate them, as they appear in subsequent pagination pages.
To achieve this, GoSpider offers the -d
flag, which allows you to set a recursion depth for visited URLs. The default value is 1 (only collects links on the start page), and a depth of 0 results in infinite crawling, which can break your crawler.
Therefore, set a maximum depth of 3 to crawl until you find all pagination links. This directs GoSpider to find and follow links three levels away from the start URL, ensuring it covers the pagination chain.
gospider -q -s "https://www.scrapingcourse.com/ecommerce/" -d 3 -o output
Your output file will look like this:
[href] - https://www.scrapingcourse.com/ecommerce/page/2/
[href] - https://www.scrapingcourse.com/ecommerce/page/3/
[href] - https://www.scrapingcourse.com/ecommerce/page/4/
# ... truncated for brevity ... #
Ideally, you should be able to use GoSpider's --whitelist
and --blacklist
options to crawl only pagination links. However, they do not work at the time of writing.
As a workaround, you can use command-line tools, such as awk
and grep
, to filter the result and isolate pagination links.
To do this, you must first inspect the page to identify the pagination structure. Navigate to the target page in a browser, right-click on a pagination element, and select Inspect. This will open the Developer Tools window, as seen in the image below:

Here, you'll notice that the pagination links all end with the format: ecommerce/page/{number}/. Using this information, navigate to your output folder and write a command to filter out pagination links.
You'll want to eliminate unwanted text columns, handle duplicates, and isolate the desired links.
cat www_scrapingcourse_com | awk '{print $3}' | awk '!seen[$0]++' | grep -E "/ecommerce/page/[0-9]+/$" | tee pagination_links
This command filters the result and saves pagination links in a new text file named pagintion_links
.
Here's an overview of what each command does:
awk '{print $3}'
: prints only the third column of the file content. In this case, it eliminates flags, such as [href], leaving only thehttps://
column.awk '!seen[$0]++'
: This ensures there are no duplicate lines.grep -E "/ecommerce/page/[0-9]+/$"
: selects links that end with the format; /ecommerce/page/{number}/.tee pagination_links
: This moves the edited text content to a new file namedpagination_links
.
Your output should look like this:
# ... omitted for brevity ... #
https://www.scrapingcourse.com/ecommerce/page/10/
https://www.scrapingcourse.com/ecommerce/page/11/
https://www.scrapingcourse.com/ecommerce/page/12/
Step 4: Scrape Data From Collected Links
Now that you've successfully isolated pagination links, the next step is to extract product information.
While GoSpider allows you to quickly crawl all links, starting from the seed URL, it doesn't offer built-in support for downloading and processing HTML pages. However, you can scrape data from collected links using other Go frameworks, such as the powerful Colly.
Colly is a lightning-fast Go library that allows you to extract structured data from websites. Therefore, you can integrate GoSpider's output with Colly to extract product information.
We'll access each pagination link and extract the product name, price, and image URL. Below is a step-by-step guide.
Start by initializing a Go project using the following command.
go mod init crawler
Then, install Colly.
go get github.com/gocolly/colly
Next, create a Go file (crawler.go
) and prepare to write some code. Also, ensure that the file containing your pagination links and this Go file are in the same root directory.
In your crawler.go
file, import the required libraries and open the pagination links file using Go's os.Open()
method.
package main
// import the required libraries
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
// open the file with pagination links
file, err := os.Open("pagination_links")
if err != nil {
log.Fatalf("Error opening file: %v", err)
}
defer file.Close()
}
Then, initialize a new scanner instance to read the file line by line. As a starting point, print each line to verify that your code is working correctly.
// ...
func main() {
// ...
// initialize a new scanner instance to read the file line by line
scanner := bufio.NewScanner(file)
for scanner.Scan() {
url := scanner.Text()
fmt.Println("Crawling:", scanner.Text())
}
}
If everything works correctly, this code will log each line of your pagination_links
file.
Now, using Colly, access each pagination link and extract product data. To achieve this, import Colly and initialize a new Collector
.
package main
// import the required libraries
import (
// ...
"github.com/gocolly/colly/v2"
)
func main() {
// initialize a new collector
c := colly.NewCollector()
// ...
}
After that, inspect a product card to identify the right selectors for the desired data points (product name, price, and image URL).

You'll notice that each product is a list item with the class product
. The following HTML elements within the list items represent each data point.
- Product name:
<h2>
with classproduct-name
. - Product price: span element with class,
product-price
. - Product image:
<img>
tag with classproduct-image
.
Using this information, create an OnHTML()
callback to find all product elements on the page and extract their product name, price, and image URL.
Colly's OnHTML is a core feature that allows you to identify and retrieve data from HTML elements using CSS selectors. The callback fires whenever your crawler encounters an HTML element that matches the defined selectors.
Additionally, we recommend organizing and managing the scraped data using structs in Go. To achieve this, define a global struct to store the product details.
// ...
// define a struct to store product details
type Product struct {
Name string
Price string
ImageURL string
}
// declare a slice to store the products
var products []Product
func main() {
// ...
// select product (list item with class product)
c.OnHTML("li.product", func(e *colly.HTMLElement) {
// retrieve product name, price, and images
productName := e.ChildText(".product-name")
productPrice := e.ChildText(".product-price")
imageURL := e.ChildAttr(".product-image", "src")
// create a new product instance and add it to the slice
product := Product{
Name: productName,
Price: productPrice,
ImageURL: imageURL,
}
products = append(products, product)
})
// ...
}
Lastly, modify the scanner
to visit each link, line by line.
// ...
func main() {
// ...
// initialize scanner instance to read file line by line
scanner := bufio.NewScanner(file)
// for each line visit the pagination link
for scanner.Scan() {
url := scanner.Text()
fmt.Println("Crawling:", url)
err := c.Visit(url)
if err != nil {
log.Printf("Error visiting %s: %v", url, err)
}
}
}
That's it.
Now, combine all the steps to get the following complete code.
package main
// import the required libraries
import (
"bufio"
"fmt"
"log"
"os"
"github.com/gocolly/colly/v2"
)
// define a struct to store product details
type Product struct {
Name string
Price string
ImageURL string
}
// declare a slice to store the products
var products []Product
func main() {
// initialize a new collector
c := colly.NewCollector()
// open the file with pagination links
file, err := os.Open("pagination_links")
if err != nil {
log.Fatalf("Error opening file: %v", err)
}
defer file.Close()
// select product (list item with class product)
c.OnHTML("li.product", func(e *colly.HTMLElement) {
// retrieve product name, price, and images
productName := e.ChildText(".product-name")
productPrice := e.ChildText(".product-price")
imageURL := e.ChildAttr(".product-image", "src")
// create a new product instance and add it to the slice
product := Product{
Name: productName,
Price: productPrice,
ImageURL: imageURL,
}
products = append(products, product)
fmt.Printf("Product Name: %s\nProduct Price: %s\nImage URL: %s\n", productName, productPrice, imageURL)
})
// initialize scanner instance to read file line by line
scanner := bufio.NewScanner(file)
// for each line visit the pagination link
for scanner.Scan() {
url := scanner.Text()
fmt.Println("Crawling:", url)
err := c.Visit(url)
if err != nil {
log.Printf("Error visiting %s: %v", url, err)
}
}
}
This extracts each product's name, price, and image URL on each page. Here's what your terminal would look like.
Crawling: https://www.scrapingcourse.com/ecommerce/page/9/
Product Name: Pierce Gym Short
Product Price: $27.00
Image URL: https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/msh12-red_main.jpg
Product Name: Portia Capri
Product Price: $49.00
Image URL: https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wp13-orange_main.jpg
// ... truncated for brevity ... //
Step 5: Export the Scraped Data to CSV
One way to turn data into actionable insights is by exporting it to CSV for further analysis. You can do this in Go using the encoding/csv
package, which allows you to create a FileWriter class and write to CSV.
Since we're using Colly, you can use its OnScraped()
callback to define actions once crawling ends.
In that case, we'll create a function to export scraped data to CSV, then call that function within the OnScraped()
callback.
Here's a step-by-step guide:
Import the required libraries and open a function to export them to CSV.
package main
// import the required modules
import (
// ...
"encoding/csv"
)
// function to export scraped data to CSV
func exportToCSV(filename string) {
// define logic to export to csv
}
This function opens a CSV file, initializes a writer class, writes the headers, and populates the rows with the scraped data.
// function to export scraped data to CSV
func exportToCSV(filename string) {
// open a CSV file
file, err := os.Create(filename)
if err != nil {
fmt.Println("Error creating CSV file:", err)
return
}
defer file.Close()
// initialize a writer class
writer := csv.NewWriter(file)
defer writer.Flush()
// write the header row
writer.Write([]string{"Name", "Price", "Image URL"})
// write the product details
for _, product := range products {
writer.Write([]string{product.Name, product.Price, product.ImageURL})
}
fmt.Println("Product details exported to", filename)
}
That's it.
To verify that everything works, combine the steps above, add the OnScraped()
callback, and call the exportToCSV()
function within this callback.
You'll get the following complete code:
// import the required libraries
import (
"bufio"
"fmt"
"log"
"encoding/csv"
"os"
"github.com/gocolly/colly/v2"
)
// define a struct to store product details
type Product struct {
Name string
Price string
ImageURL string
}
// declare a slice to store the products
var products []Product
func main() {
// initialize a new collector
c := colly.NewCollector()
// open the file with pagination links
file, err := os.Open("pagination_links")
if err != nil {
log.Fatalf("Error opening file: %v", err)
}
defer file.Close()
// select product (list item with class product)
c.OnHTML("li.product", func(e *colly.HTMLElement) {
// retrieve product name, price, and images
productName := e.ChildText(".product-name")
productPrice := e.ChildText(".product-price")
imageURL := e.ChildAttr(".product-image", "src")
// create a new product instance and add it to the slice
product := Product{
Name: productName,
Price: productPrice,
ImageURL: imageURL,
}
products = append(products, product)
fmt.Printf("Product Name: %s\nProduct Price: %s\nImage URL: %s\n", productName, productPrice, imageURL)
})
// add the OnScraped callback to define actions after extracting data.
c.OnScraped(func(r *colly.Response) {
fmt.Println("Data extraction complete", r.Request.URL)
// export the collected products to a CSV file after scraping.
exportToCSV("product_data.csv")
})
// initialize scanner instance to read file line by line
scanner := bufio.NewScanner(file)
// for each line visit the pagination link
for scanner.Scan() {
url := scanner.Text()
fmt.Println("Crawling:", url)
err := c.Visit(url)
if err != nil {
log.Printf("Error visiting %s: %v", url, err)
}
}
}
// function to export scraped data to CSV
func exportToCSV(filename string) {
// open a CSV file
file, err := os.Create(filename)
if err != nil {
fmt.Println("Error creating CSV file:", err)
return
}
defer file.Close()
// initialize a writer class
writer := csv.NewWriter(file)
defer writer.Flush()
// write the header row
writer.Write([]string{"Name", "Price", "Image URL"})
// write the product details
for _, product := range products {
writer.Write([]string{product.Name, product.Price, product.ImageURL})
}
fmt.Println("Product details exported to", filename)
}
This exports the scraped data to a CSV file named product_data.csv
in your project's root directory.
Here's a sample screenshot for reference.

Congratulations! You now know how to use GoSpider for web crawling and also export scraped data to CSV.
Avoid Getting Blocked While Crawling With GoSpider
Getting blocked is a common challenge when web crawling. This is because web crawlers exhibit patterns that make it easy for anti-bot solutions to identify and block your requests.
Here's a GoSpider command attempting to crawl the Antibot Challenge page, a protected website.
gospider -s "https://www.scrapingcourse.com/antibot-challenge" -o output
You'll get the following 403 error, indicating that the target server understood your requests but refused to fulfill them.
[url] - [code-403] - https://www.scrapingcourse.com/antibot-challenge
This happens because GoSpider is unable to pass the anti-bot challenge and ultimately gets blocked.
Common recommendations for overcoming this challenge include rotating proxies and setting custom user agents. However, these measures do not work against advanced anti-bot solutions.
To guarantee you can crawl any website without getting blocked, consider ZenRows' Universal Scraper API, the most reliable solution for scalable web crawling.
ZenRows is a complete web scraping toolkit that handles every anti-bot solution for you, allowing you to focus on extracting your desired data. Some of its features include advanced anti-bot bypass out of the box, geo-located requests, fingerprinting evasion, actual user spoofing, request header management, and more.
Here's ZenRows in action against the same anti-bot challenge where GoSpider failed.
To follow along in this example, sign up to get your free API key.
Completing your sign-up will take you to the Request Builder page, where you'll find your API key at the top right.

Input your target URL and activate Premium Proxies and JS Rendering boost mode.
Next, select the Go language and choose the API option. ZenRows works with any language and provides ready-to-use snippets for the most popular ones.
Copy the generated code on the right to your editor for testing.
Your code should look like this:
package main
import (
"io"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fantibot-challenge&js_render=true&premium_proxy=true", nil)
resp, err := client.Do(req)
if err != nil {
log.Fatalln(err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
}
log.Println(string(body))
}
This code bypasses the anti-bot challenge and retrieves the HTML.
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations! You're well-equipped to crawl any website without getting blocked. Â
Conclusion
You've learned how to crawl websites using GoSpider. From setting up your project to integrating with other Go tools, here's a quick recap of your progress.
You now know how to:
- Crawl specific links.
- Extract data from collected links.
- Export scraped data to CSV.
Bear in mind that to take advantage of your crawling skills, you must first overcome anti-bot challenges. GoSpider is a useful crawling tool. However, advanced anti-bot solutions will always block your GoSpider crawler.
To crawl any website without getting blocked, consider ZenRows, an easy-to-implement and scalable solution.