Most websites use one or the other form of pagination to organize content and improve load times. However, when it comes to scraping, pagination isn't just a design feature but a challenge that demands strategy.Â
In this guide, you'll discover practical techniques to scrape sites with various pagination types, from straightforward static pages to complex dynamic implementations. Let's begin.
What Is Pagination in Web Scraping?
Pagination is a web design concept that divides content into discrete pages. It reduces the number of resources loaded at once, improving user experience and reducing load times.
Pagination in web scraping is a technique for extracting content from a paginated website. Most websites you'll scrape in real life will feature pagination, and the style varies by site. So, proper pagination handling is essential during web scraping or crawling.Â
Before moving on, you need to understand the pagination types in more detail.
Types of Pagination

A paginated website can be static or dynamic. Static pagination typically features a clickable navigation bar for switching pages. The dynamic type uses infinite scrolling or a load more button to render content dynamically as the user scrolls down the page.Â
While these pagination styles differ in implementation, each requires a specific approach during web scraping. This article will cover the different pagination types and use Python code examples in each case.
URL-based Pagination
URL-based pagination is a technique that changes the page number visibly in the URL as the user navigates a page. While this pagination style is more common on websites with navigation bars, some dynamic paginations (infinite scroll and load more) may also implement it.
The easiest way to scrape a website with URL-based pagination is to change the page number in the URL programmatically. However, this method has one drawback: you need to know the number of pages on the site.
The Pagination Challenge page below is an example of a website that uses the URL-based style despite having a navigation bar.
Here's an example showing the URL pattern for page 2:

The website formats the page URL like so:
https://www.scrapingcourse.com/pagination/<PAGE_NUMBER>
You'll scrape product names, prices, and image URLs from all the pages on this website. The URL reflects the current page number and increases as the user navigates from the first to the last page.
Since the site has 13 pages, you can increment the page number in the URL until it reaches page 13. Let's see how it works in the example below.
The following code uses Python's Requests to simulate page increment via the URL inside a scraper function. It then uses BeautifulSoup to parse and extract the target data:
# pip3 install beautifulsoup4 requests
import requests
from bs4 import BeautifulSoup
url = "https://www.scrapingcourse.com/pagination"
def scraper(url):
# request the target website
response = requests.get(url)
# specify a list to collect products from each page
products = []
if response.status_code != 200:
return f"status failed with {response.status_code}"
else:
# parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")
# obtain the main product card
product_card = soup.find_all("div", class_="product-item")
# iterate through product card to retrieve product names and prices
for product in product_card:
product_info = {
"Name": product.find("span", class_="product-name").text,
"Price": product.find("span", class_="product-price").text,
"Image URL": product.find("img").get("src"),
}
products.append(product_info)
return products
# define a product data list to collect all extracted data
product_data = []
# set the initial page number to 1
page_count = 1
# scrape until the last page (1 to 13)
for next in range(1, 14):
# get the new page full URL
page_url = f"{url}/{page_count}"
print(f"Scraping from: {page_url}")
# extend the list with products from the current page
product_data.extend(scraper(page_url))
# increment the page number
page_count += 1
# print the extracted product
print(product_data)
The code extracts the product data from all the pages, as shown:
[
{
'Name': 'Chaz Kangeroo Hoodie',
'Price': '$52',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh01-gray_main.jpg',
},
# ... other products omitted for brevity,
{
'Name': 'Breathe-Easy Tank',
'Price': '$34',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wt09-white_main.jpg',
}
]
Want to learn more? Read our guide on pagination scraping with Python's Request.Â
Pagination With a Next Button
Pagination styles with a next button typically require clicking a button to switch pages. A common feature of such websites is a dedicated navigation bar, which may feature the page numbers.
To scrape this type of pagination site, you'll need to follow the next page link programmatically. The previous Pagination Challenge page is a solid example. It features a navigation bar with the previous and next buttons:

First, inspect the next button element in the developer console to get the URL of the next page. Open the website using a browser like Chrome, right-click the next button, and then click Inspect to view its elements.Â
See the next element (.next-page
) below:

Once you get the next page element, you can extract it with a CSS or XPath selector and follow its link. Below, we will see how to scrape this pagination type.
Create a scraper function, request the initial target page using Python's Requests, and implement the extraction process. Get the next page link and recursively call the scraper function. The scraping task runs for each discovered page until no more next page links are in the DOM.Â
Here's the complete code demo:
# pip3 install beautifulsoup4 requests
import requests
from bs4 import BeautifulSoup
url = "https://www.scrapingcourse.com/pagination"
def scraper(url):
# request the target url
response = requests.get(url)
products = []
# validate the request
if response.status_code != 200:
return f"status failed with {response.status_code}"
else:
# parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")
# obtain the main product card
product_card = soup.find_all("div", class_="product-item")
# iterate through product card to retrieve product names and prices
for product in product_card:
product_info = {
"Name": product.find("span", class_="product-name").text,
"Price": product.find("span", class_="product-price").text,
"Image URL": product.find("img").get("src"),
}
products.append(product_info)
# get the next page link
link = soup.find("a", class_="next-page")
# check if the next page exists and call the scraper function recursively if so
if link:
next_link = link.get("href")
print(f" Scraping from: {next_link}")
# recursively call the function on the next page link if it exists
# then, combine results from next page
products.extend(scraper(next_link))
# return the collected product data
return products
# execute the scraper function
print(scraper(url))
Run the above code, and you'll get the following output:
[
{
'Name': 'Chaz Kangeroo Hoodie',
'Price': '$52',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh01-gray_main.jpg',
},
# ... other products omitted for brevity,
{
'Name': 'Breathe-Easy Tank',
'Price': '$34',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wt09-white_main.jpg',
}
]
Nice! So far, you've learned how to scrape static paginations. Let's now move to the dynamic types.
Pagination With Infinite Scroll
Infinite scrolling is a form of dynamic pagination where websites load content automatically as the user scrolls down the page.
Scraping this type of pagination involves dynamic content extraction, often requiring a browser automation tool like Selenium to simulate user interactions, such as scrolling.
Alternatively, if the content is loaded via API calls or AJAX requests, you can use a standard HTTP client like the Requests library. This approach involves intercepting network traffic to identify and replicate the underlying requests that fetch additional data. However, it may not work if your scraping task involves complex actions like clicking or hovering.Â
In this example, you'll use the traffic interception method with the Requests library. Let's see how it works by scraping product data from the infinite scrolling challenge page.Â
Here's a demo showing how the web page renders content:

To inspect the XHR request pattern, open the website via a browser like Chrome and go to the DevTools (right-click anywhere on the website and click Inspect).Â
Go to the Network tab and scroll down the page. Several requests, including images, JavaScript, and CSS resources, will load within the tab.Â
You'll notice that the API call loading the content after an initial scroll is in the products?offset=0
request:

The request URL for the product offset has the following format:
https://www.scrapingcourse.com/ajax/products?offset=<OFFSET_NUMBER>
That offset increases by 10 per scroll as you scroll further, with the last value being 150. This pattern shows that the website loads 10 products per scroll.
To scrape the page, create a scraper function to implement the data extraction logic. Then, simulate the network request by increasing the offset number in the URL by 10 within a range of 15 iterations. Here's the sample code to achieve that:
# pip3 install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup
# specify the base request URL
url = "https://www.scrapingcourse.com/ajax/products"
def scraper(url):
# request the target website
response = requests.get(url)
# verify the response status
if response.status_code != 200:
return f"status failed with {response.status_code}"
else:
# parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")
# empty list to collect data
scraped_data = []
# get the product containers
products = soup.find_all("div", class_="product-item")
# iterate through the product containers and extract the product content
for product in products:
data = {
"Name": product.find(class_="product-name").text,
"Price": product.find(class_="product-price").text,
"Image URL": product.find("img").get("src"),
}
# append the data to the empty list
scraped_data.append(data)
# return the scraped data
return scraped_data
# set an initial request count
offset_count = 0
# array to collect scraped data
product_data = []
# scrape infinite scroll by intercepting API request
# in the Network tab (150 offsets/10 = 15 offsets)
for page in range(0, 15):
# simulate the full URL format
requested_page_url = f"{url}?offset={offset_count}"
# execute the scraper function
collected_data = scraper(requested_page_url)
# extend the new list with the scraped data
product_data.extend(collected_data)
# increment the request count
offset_count += 10
print(product_data)
Here's the output from the above code:
[
{
'Name': 'Chaz Kangeroo Hoodie',
'Price': '$52',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh01-gray_main.jpg',
},
# ... other products omitted for brevity,
{
'Name': 'Breathe-Easy Tank',
'Price': '$34',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wt09-white_main.jpg',
}
]
You just scraped an infinite scroll pagination by intercepting the network traffic with Python's Requests and BeautifulSoup.Â
Read our detailed article on Requests pagination to scrape infinite scrolling pages to learn more about this concept.
Pagination With Load More button
A load more button pagination is similar to infinite scrolling, but it requires the user to click a button to view more content as they scroll down a page.Â
Like the infinite scrolling method, you can scrape a load more pagination using an automation tool like Selenium or a standard HTTP client like Requests.
This time, you'll simulate the "Load more" click action with Selenium. In this case, let's use the load more challenge page as the target site. See how the page works below:

As done in previous examples, you'll extract product data (names, prices, and image URLs) from this page.
The Selenium script below opens the website and implements a scroll action. It then scrapes the product data at each height and locates and clicks the "Load more" button element after each scroll:
# pip3 install selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time
# set up the selenium webdriver in headless mode
chrome_options = Options()
chrome_options.add_argument("--headless=new")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://www.scrapingcourse.com/button-click")
# wait for the page to load
wait = WebDriverWait(driver, 10)
# list to store the extracted product data
products = []
# function to extract product data
def scraper():
product_card = driver.find_elements(By.CLASS_NAME, "product-item")
for product in product_card:
product_info = {
"Name": product.find_element(By.CLASS_NAME, "product-name").text,
"Price": product.find_element(By.CLASS_NAME, "product-price").text,
"Image URL": product.find_element(By.TAG_NAME, "img").get_attribute("src"),
}
products.append(product_info)
return products
# simulate clicking the "load more" button until no new content is loaded
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
try:
# extract product data after each click
scraper()
# wait until the button is visible and clickable
load_more_button = wait.until(
EC.element_to_be_clickable((By.ID, "load-more-btn"))
)
load_more_button.click()
time.sleep(2) # allow time for new content to load
# check if the page height has increased
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
except Exception as e:
print("no more content to load or an error occurred:", e)
break
print(products)
# close the browser
driver.quit()
The above scraper scrolls the entire page and outputs the desired product data, as shown:
[
{
'Name': 'Chaz Kangeroo Hoodie',
'Price': '$52',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh01-gray_main.jpg',
},
# ... other products omitted for brevity,
{
'Name': 'Breathe-Easy Tank',
'Price': '$34',
'Image URL': 'https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wt09-white_main.jpg',
}
]
Great! You've learned to handle the different pagination styles during data extraction. However, applying some more advanced pagination scraping concepts will help you build an efficient web scraper.
Advanced Pagination Techniques and Best Practices
Since a pagination scraper requests several pages, it's prone to potential issues, such as slow execution, IP bans, connection errors, etc. Fortunately, you can avoid these with the following techniques.
Asynchronous Pagination
The previous web scraping example scripts employ standard sequential scraping. This method can slow down the scraping process since the scraper must fulfill a current request before executing a subsequent one.
Running your scraper asynchronously can significantly enhance its performance. This technique allows you to request several pages simultaneously in a non-blocking mechanism.Â
You can wrap the Requests library with asyncio
for asynchronous support. However, this method may not produce a genuinely asynchronous setup since Requests is synchronous by default. A more reliable approach is to use a non-blocking HTTP client like AIOHTTP to optimize for asynchronous scraping.
First, ensure you install AIOHTTP using pip
pip3 install aiohttp
The scraper below requests the target website and runs the scraping logic asynchronously using an aiohttp
session. It then executes the entire process concurrently using Python's built-in asyncio
:
# pip3 install beautifulsoup4 aiohttp
import aiohttp
import asyncio
from bs4 import BeautifulSoup
url = "https://www.scrapingcourse.com/pagination"
async def fetch(session, url):
# fetch the url asynchronously
async with session.get(url) as response:
if response.status != 200:
return f"status failed with {response.status}"
return await response.text()
async def scraper(session, url):
# scraper function to parse products and handle pagination
products = []
# fetch the page content
html_content = await fetch(session, url)
if isinstance(html_content, str):
# parse the html content
soup = BeautifulSoup(html_content, "html.parser")
# obtain the main product card
product_card = soup.find_all("div", class_="product-item")
# iterate through product card to retrieve product names and prices
for product in product_card:
product_info = {
"name": product.find("span", class_="product-name").text.strip(),
"price": product.find("span", class_="product-price").text.strip(),
"image url": product.find("img").get("src"),
}
products.append(product_info)
# get the next page link
link = soup.find("a", attrs={"rel": "next"})
# check if the next page exists and call the scraper function recursively if so
if link:
next_link = link.get("href")
print(f" scraping from: {next_link}")
# recursively scrape the next page
products.extend(await scraper(session, next_link))
return products
async def main(url):
# main entry point for the scraper
async with aiohttp.ClientSession() as session:
products = await scraper(session, url)
return products
# run the asynchronous scraper
if __name__ == "__main__":
scraped_products = asyncio.run(main(url))
print(scraped_products)
Splendid! You just supercharged your pagination scraper with asynchronous execution.Â
Rate Limiting and Backoffs
One of the disadvantages of asynchronous scraping is that requests can be too frequent, leading to suspicion and potential IP bans.Â
You can control your request frequency using exponential backoffs, a process that exponentially increases the wait time for every failed request. This technique also adds more human touch to your scraper and can reduce the chances of detection.
For instance, the delay pattern for backoff factors 2 and 3 are:
# 2
1, 2, 4, 8, 16, 32, 64, 128
# 3
1.5, 3, 12, 24, 48, 96, 192, 384
The exponential backoff algorithm looks like this:
backoff_factor * (2 ** (current_number_of_retries - 1))
The following code modifies the AIOHTTP fetch
function in the previous scraper with an exponential backoff:
# pip3 install beautifulsoup4 aiohttp
import aiohttp
import asyncio
from bs4 import BeautifulSoup
url = "https://www.scrapingcourse.com/pagination"
# delay in seconds
BACKOFF_FACTOR = 1
# maximum number of retries
MAX_RETRIES = 5
async def fetch(session, url, retries=0):
# fetch the url asynchronously
async with session.get(url) as response:
if response.status == 200:
return await response.text()
else:
print(f"Request failed with status {response.status}. Retrying...")
raise aiohttp.ClientResponseError(
request_info=response.request_info, history=response.history
)
# implement exponential backoff
if retries < MAX_RETRIES:
backoff_time = BACKOFF_FACTOR * (2 ** (retries - 1))
print(f"Retrying in {backoff_time} seconds...")
await asyncio.sleep(backoff_time)
return await fetch(session, url, retries + 1)
else:
print("Max retries reached. Returning failure.")
return None
async def scraper(session, url):
products = []
# fetch the page content with retries
html_content = await fetch(session, url)
if html_content:
# parse the HTML content
soup = BeautifulSoup(html_content, "html.parser")
# obtain the main product card
product_card = soup.find_all("div", class_="product-item")
# iterate through product card to retrieve product names and prices
for product in product_card:
product_info = {
"name": product.find("span", class_="product-name").text.strip(),
"price": product.find("span", class_="product-price").text.strip(),
"image url": product.find("img").get("src"),
}
products.append(product_info)
# get the next page link
link = soup.find("a", class_="next-page")
# check if the next page exists and call the scraper function recursively if so
if link:
next_link = link.get("href")
print(f"Scraping from: {next_link}")
# recursively scrape the next page
products.extend(await scraper(session, next_link))
return products
async def main(url):
# main entry point for the scraper
async with aiohttp.ClientSession() as session:
products = await scraper(session, url)
return products
# run the asynchronous scraper
if __name__ == "__main__":
scraped_products = asyncio.run(main(url))
print(scraped_products)
Nice! You've learned to control the request frequency with an exponential backoff strategy.
Error Handling and Recovery
Effective error handling is essential during web scraping. Common pagination scraping errors include missing or broken page links, missing elements, timeouts, server-side restrictions, and more. These issues can break your scraper, resulting in potential issues like missing data or unsuccessful operations.
You can avoid runtime errors during scraping by adding proper error and exception handling mechanisms, such as try
blocks. Adequate logging can also help debug faster, allowing you to spot the point and cause of failures during scraping.
Recovery mechanisms, such as fallbacks, can prevent your scraper from failing when encountering unexpected issues, like CAPTCHA challenges, network errors, or changes in website structure.
Another effective way to handle errors is to use retry mechanisms. A retry mechanism is a technique that allows you to automatically retry a failed request for a specified number of times. In practice, most retries will implement a fallback or stop retrying once the maximum number of attempts is reached. Retry mechanisms can also take some twists. A good example of such is the exponential backoff strategy discussed earlier.
Avoid Getting Blocked
One of the biggest challenges of scraping paginated websites is getting blocked by anti-bot measures.Â
Since your scraper hits several pages on the same website, you'll likely appear as a bot after some time, resulting in potential blocking. So, while building your pagination scraper, it's essential to implement measures to avoid anti-bot detection.
Unfortunately, open-source tools like Selenium, Requests, and AIOHTTP can't handle anti-bot measures. They often have bot-like properties, such as missing fingerprints, inconsistent header parameters, and more.
For instance, Selenium will get blocked on a protected website like the Antibot Challenge page. Try it with the following code:
# pip3 install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
# setup Chrome options for headless mode
chrome_options = Options()
chrome_options.add_argument("--headless=new")
# initialize the WebDriver with the options
driver = webdriver.Chrome(options=chrome_options)
# open the target page
url = "https://www.scrapingcourse.com/antibot-challenge"
driver.get(url)
# wait for the page to load
time.sleep(3)
# take a screenshot
driver.save_screenshot("screenshot.png")
# close the WebDriver
driver.quit()
Selenium got blocked, as shown below:

One way to scrape without getting blocked is to use web scraping proxies or custom request headers. However, these techniques are insufficient to evade blocks completely.
A more reliable way to scrape any paginated website without getting blocked is to use a web scraping API, such as ZenRows' Universal Scraper API. ZenRows provides all the essential toolkits required to scrape any website successfully.Â
With a single API call, you get rotating premium proxies, optimized request headers, JavaScript rendering, advanced fingerprint evasion, anti-bot auto-bypass, and more.
Let's see how ZenRows works by scraping the full-page HTML of the anti-bot challenge page that previously blocked you.
Sign up on ZenRows and go to Request Builder. Paste your target URL in the link box and activate Premium Proxies and JS Rendering.

Then, select Python as your programming language and choose the API connection mode. Copy the generated Python code and paste it into your scraper.
The generated Python code should look like this:
# pip install requests
import requests
url = "https://www.scrapingcourse.com/antibot-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
"url": url,
"apikey": apikey,
"js_render": "true",
"premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above code outputs the target site's full-page HTML, proving you bypassed the anti-bot detection:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations! 🎉 You just bypassed anti-bot protection using ZenRows.
Conclusion
You've learned about the different pagination styles, including the core concepts of handling each when scraping. You've also seen how to optimize your pagination scraper with advanced techniques to make it more efficient.
However, remember that most paginated websites will implement anti-bot measures to block you, regardless of how sophisticated your scraper is. The easiest way to bypass these blocks and scrape without limitations is to use a web scraping solution like ZenRows.