Using Concurrency

Depending on the plan you are in, you'll be able to perform ten or more concurrent requests. To summarize, many API calls can run at the same time, which means that you can scrape ten URLs at the same time. Once the first one finishes, another will launch, maintaining the concurrency on ten until there are no more URLs.

Sounds reasonable, but most clients/languages don't have a structure for that. So you will probably need to write your own. Or take ours 😉

ZenRows SDK for Python

For the code to work, you will need python3 installed. Some systems have it pre-installed. After that, install all the necessary libraries by running pip install.

pip install zenrows

ZenRows Python SDK comes with concurrency and retries already implemented for you. Pass in the numbers in the constructor as seen below; remember to adjust the concurrency according to your plan. Take into account that each client instance will have its own limit, meaning that two different scripts will not share it, and 429 (Too Many Requests) errors might arise.

asyncio.gather will wait for all the calls to finish and store all the responses in an array. You can loop over it afterward and extract the data you need. As usual, each response will have the status, request, response content, and other values. Remember to run the scripts with asyncio.run or it will fail with a coroutine 'main' was never awaited error.

from zenrows import ZenRowsClient 
import asyncio 
from urllib.parse import urlparse, parse_qs 
 
client = ZenRowsClient("YOUR_KEY", concurrency=5, retries=1) 
 
urls = [ 
	# ... 
] 
 
async def main(): 
	responses = await asyncio.gather(*[client.get_async(url) for url in urls]) 
 
	for response in responses: 
		original_url = parse_qs(urlparse(response.request.url).query)["url"] 
		print({ 
			"response": response, 
			"status_code": response.status_code, 
			"request_url": original_url, 
		}) 
 
asyncio.run(main())

Python with requests

As above, you'll need python3 installed.

pip install requests

Python offers the multiprocessing package with an implementation for a worker pool.

import requests 
from multiprocessing.pool import ThreadPool 
 
apikey = "YOUR_KEY" 
concurrency = 10 
urls = [ 
	# ... your URLs here 
] 
 
def scrape_with_zenrows(url): 
	response = requests.get( 
		url="https://api.zenrows.com/v1/", 
		params={ 
			"url": url, 
			"apikey": apikey, 
		}, 
	) 
 
	return { 
		"content": response.text, 
		"status_code": response.status_code, 
		"request_url": url, 
	} 
 
pool = ThreadPool(concurrency) 
results = pool.map(scrape_with_zenrows, urls) 
pool.close() 
pool.join() 
 
[print(result) for result in results]

ZenRows SDK for Javascript

Sadly, Javascript does not offer such a pool, but we implemented it, so you don't have to. ZenRows Javascript SDK comes with concurrency and retries options.

npm i zenrows

Pass in the numbers in the constructor as seen below; remember to adjust the concurrency according to your plan. Take into account that each client instance will have its own limit, meaning that two different scripts will not share it, and 429 (Too Many Requests) errors might arise.

We use Promise.allSettled() in the example below, available from Node 12.9. It will wait for all the promises to finish, and the results are objects with a status marking them as fulfilled or rejected. The main difference with Promise.all() is that it won't fail if any requests fail. It might make your scraping more robust since the whole list of URLs will run, even if some of them fail.

const { ZenRows } = require('zenrows'); 
 
const apiKey = 'YOUR_KEY'; 
 
(async () => { 
	const client = new ZenRows(apiKey, { concurrency: 5, retries: 1 }); 
 
	const urls = [ 
		// ... 
	]; 
	const promises = urls.map(url => client.get(url)); 
 
	const results = await Promise.allSettled(promises); 
	console.log(results); 
	/* 
	[ 
		{ 
			status: 'fulfilled', 
			value: { 
				status: 200, 
				statusText: 'OK', 
				data: `   ... 
			 
		... 
	*/ 
 
	// separate results list into rejected and fulfilled for later processing 
	const rejected = results.filter(({ status }) => status === 'rejected'); 
	const fulfilled = results.filter(({ status }) => status === 'fulfilled'); 
})();