Node-Fetch is a widely adopted HTTP library that offers a simple way to make asynchronous fetch
requests. Unlike the default Fetch API available in modern web browsers, Node-Fetch can be used in the backend of NodeJS applications and scripts, which makes it an ideal choice for web scraping.
However, like other HTTP libraries, your web scraper will quickly get blocked, so you need proxies to retrieve the necessary data. Node-Fetch doesn't support implementing them, but there's a workaround.
In this article, you'll learn how to use a Node-Fetch proxy to increase your anonymity and emulate human behavior.
Before we dive in, here's what you'll need to follow along in this tutorial.
Prerequisite
To get started, you'll need at least Node.js version 12.20.0 installed. That's the minimum requirement for the current Node-Fetch stable release.
Then, install Node-Fetch using the following command in your project directory:
npm i node-fetch
Sometimes, you might encounter an error message when using the Node-fetch library in your project: Error: cannot find module 'node-fetch'.
. That typically occurs when the module resolution can't find your installed Node-Fetch module. To resolve this issue, install all the necessary dependencies by running the following command in the directory containing the package-lock.json
file:
npm install
If you still get the error, note that the current Node-Fetch release (v3.x) is an ESM-only module. So, in your package.json
file, set "type" = "module"
to be able to load an ES module. Your package.json
file should look like this:
If you can't switch to ESM, downgrade to Node-Fetch v2, which is compatible with common JS using this command:
npm install node-fetch@2
Version 2 and 3 are the same, except for some syntax changes, so you don't need to worry about reduced functionality.\
You can learn more in our guide on web scraping with NodeJS.
How to Use a Proxy with Node-Fetch
As previously mentioned, Node-Fetch doesn't support proxies. So to use one, you must integrate a proxy server using HTTPS-proxy-agent
.
Start by installing the required package HTTPS-proxy-agent using the command below:
npm install https-proxy-agent
Then, open a new project file and import the dependencies Node-Fetch and HTTPS-proxy-agent, and open an async
function.
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
(async () => {
})();
Define the proxy configuration by specifying your proxy details. You can use a free proxy from Free-Proxy-List.
const proxyHost = '200.105.215.22';
const proxyPort = 33630;
HTTPS proxies work for both HTTP and HTTPS websites, while HTTP proxies don't work for HTTPS websites.
Next, construct the proxy URL and define your target website. In this example, we're scraping ident.me, an endpoint that provides the IP address of the device making the request. So, our code looks like this:
const proxyUrl = `http://${proxyHost}:${proxyPort}`;
const targetUrl = 'https://ident.me/ip';
Create a proxy agent using a new HttpsProxyAgent
instance that takes the proxy URL as the parameter.
const proxyAgent = new HttpsProxyAgent(proxyUrl);
Lastly, make a fetch
request to the target URL using the defined proxy agent as the agent
option.
const response = await fetch(targetUrl, { agent: proxyAgent });
Putting it all together, your code should look like this:
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
(async () => {
// Proxy configuration
const proxyHost = '200.105.215.22';
const proxyPort = 33630;
// Target website URL
const targetUrl = 'https://ident.me/ip';
// Proxy URL
const proxyUrl = `http://${proxyHost}:${proxyPort}`;
// Create a new Proxy Agent
const proxyAgent = new HttpsProxyAgent(proxyUrl);
// Fetch the target website using the proxy agent
const response = await fetch(targetUrl, { agent: proxyAgent });
})();
Let's verify it works.
//...
const html = await response.text();
console.log(html);
//.. 200.105.215.22 ..//
Yay! The result above is our proxy's IP address rather than our original IP. Great, now you know how to set up a Node-Fetch proxy.
Use a Rotating Proxy with Node-Fetch
Many websites implement IP-based blocking or rate limiting to combat bot activities, such as sending too many requests from a single IP address. Depending on your use case, you may need more than setting up a Node-fetch proxy to avoid getting blocked.Â
By rotating proxies, each request will appear to originate from a different IP address, making it more difficult for websites to detect you. Let's see how to rotate a Node-Fetch proxy.
Rotate IPs with a Free Solution
Rotating proxies with a free solution involves using a pool of multiple proxies to randomize with each request. Free solutions often provide a list you can download into a CSV file. However, it's better to hand-pick working proxy servers to your CSV file or hard code as a static array.
Let's rotate some proxies.
First, import the necessary dependencies and define your Node-Fetch proxy list array.
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
const proxyList = [
{ host: '103.69.108.78', port: 8191 },
{ host: '61.29.96.146', port: 80 },
{ host: '154.204.58.155', port: 8090 },
];
Like earlier, the list above is obtained from Free-Proxy-List.
Define a function that takes your proxy list array and target URL as arguments. A for
loop function will construct a proxy URL, create a proxy agent, and make a request through each proxy in the array. It should also print the HTML content of the target URL.
async function RotateProxy(proxyList, targetUrl) {
for (const proxy of proxyList) {
try {
const proxyUrl = `http://${proxy.host}:${proxy.port}`;
const proxyAgent = new HttpsProxyAgent(proxyUrl);
const response = await fetch(targetUrl, { agent: proxyAgent });
const html = await response.text();
console.log(html);
} catch (error) {
console.error(error);
}
}
}
Lastly, define your target URL and call the function.
const targetUrl = 'https://ident.me/ip';
await RotateProxy(proxyList, targetUrl);
Your complete code should look like this:
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
const proxyList = [
{ host: '103.69.108.78', port: 8191 },
{ host: '61.29.96.146', port: 80 },
{ host: '154.204.58.155', port: 8090 },
];
async function RotateProxy(proxyList, targetUrl) {
for (const proxy of proxyList) {
try {
//construct proxy URL
const proxyUrl = `http://${proxy.host}:${proxy.port}`;
//create proxy agent
const proxyAgent = new HttpsProxyAgent(proxyUrl);
//make request using random proxy from array
const response = await fetch(targetUrl, { agent: proxyAgent });
const html = await response.text();
console.log(html);
} catch (error) {
console.error(error);
}
}
}
const targetUrl = 'https://ident.me/ip';
await RotateProxy(proxyList, targetUrl);
And you should have a similar result to the following one, which shows that each request is made using different proxies from the array.
103.69.108.78
61.29.96.146
154.204.58.155'
Congrats, you've created your first proxy rotator using Node-fetch!
However, free proxies are unreliable, and finding a list that works can be challenging. Plus, even if you find one, websites easily detect and block them. For example, we'll replace our sample URL (ident.me) with an Amazon product page and log the status code of each request:
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
const proxyList = [
{ host: '103.69.108.78', port: 8191 },
{ host: '61.29.96.146', port: 80 },
{ host: '154.204.58.155', port: 8090 },
];
async function RotateProxy(proxyList, targetUrl) {
for (const proxy of proxyList) {
try {
//construct proxy URL
const proxyUrl = `http://${proxy.host}:${proxy.port}`;
//create proxy agent
const proxyAgent = new HttpsProxyAgent(proxyUrl);
//make request using random proxy from array
const response = await fetch(targetUrl, { agent: proxyAgent });
const html = await response.text();
//request status code
const statusCode = response.status;
console.log('Status Code:', statusCode);
console.log(html);
} catch (error) {
console.error(error);
}
}
}
const targetUrl = 'https://www.amazon.com/Bose-QuietComfort-45-Bluetooth-Canceling-Headphones/dp/B098FKXT8L?th=1';
await RotateProxy(proxyList, targetUrl);
This is the result we got:
Status Code: 403
//..
<HTML><HEAD><TITLE>Endian Firewall - Access denied</TITLE>
Status Code: 403
//...
<HTML><HEAD><TITLE>Endian Firewall - Access denied</TITLE>
Status Code: 403
//...
<HTML><HEAD><TITLE>Endian Firewall - Access denied</TITLE>
Error 403 in web scraping means we were denied access, proving rotating free proxies doesn't work for practical use cases. The better solution is to use premium proxies. Check out our list of the best proxy services to find some popular options.
Meanwhile, let's see how to avoid getting blocked with a premium proxy.
Premium Proxy to Avoid Getting Blocked
Premium proxies used to be expensive, especially for large-scale projects. However, the emergence of solutions like ZenRows has changed the landscape because it provides affordable premium proxy plans that ensure you only pay for successful requests.
ZenRows' key benefit is its out-of-the-box tools to bypass anti-bot measures. With parameters such as premium_proxy=true
and js_render=true
, you can render JavaScript content and mimic human behavior, all with a few lines of code.
To use ZenRows with Node-Fetch, first, sign up to get your free API key. You'll find it at the top of the Request Builder page.
Now, all that's left is to make a request to your target URL through the API. ZenRows will take care of rotating premium proxies behind the scenes to deliver your needed data.
As an example, we'll use the same Amazon product page that denied our free proxy access as the target URL.
Start by setting your API key and target URL and use them to construct your API URL.
import fetch from 'node-fetch';
const Api_key = '<YOUR_ZENROWS_API_KEY>';
const targetUrl = 'https://www.amazon.com/Bose-QuietComfort-45-Bluetooth-Canceling-Headphones/dp/B098FKXT8L?th=1';
const apiUrl = `https://api.zenrows.com/v1/?apikey=${Api_key}&url=${encodeURIComponent(targetUrl)}`;
Define the necessary parameters. Activate Premium Proxies and JS rendering features.
-
"premium_proxy":"true"
-
"js_render":"true"
const params = {"js_render":"true","premium_proxy":"true"}
Then make a fetch
request to the API URL, passing the parameters as request options. Lastly, log the HTML content and status code.
(async () => {
const response = await fetch(apiUrl, {
headers: {
'Content-Type': 'application/json',
},
params: params,
});
const html = await response.text();
const statusCode = response.status;
console.log('Status Code:', statusCode);
console.log(html);
})();
Your complete code should look like this:
import fetch from 'node-fetch';
const Api_key = '<YOUR_ZENROWS_API_KEY>';
const targetUrl = 'https://www.amazon.com/Bose-QuietComfort-45-Bluetooth-Canceling-Headphones/dp/B098FKXT8L?th=1';
const apiUrl = `https://api.zenrows.com/v1/?apikey=${Api_key}&url=${encodeURIComponent(targetUrl)}`;
const params = {"js_render":"true","premium_proxy":"true"}
(async () => {
const response = await fetch(apiUrl, {
headers: {
'Content-Type': 'application/json',
},
params: params,
});
const html = await response.text();
const statusCode = response.status;
console.log('Status Code:', statusCode);
console.log(html);
})();
And you'll get the following result:
Status Code: 200
//...
{
"answers": "280 answered questions",
"availability": "In Stock In Stock",
"avg_rating": "4.6 out of 5 stars",
"category": "Electronics › Headphones, Earbuds & Accessories › Headphones & Earbuds › Over-Ear Headphones",
"description": "",
"discount": "-15%-20%",
"out_of_stock": false,
"price": "$279.00",
"price_without_discount": "$329.00",
"review_count": "",
"ships_from": "",
"sold_by": "",
"title": "Bose QuietComfort 45 Bluetooth Wireless Noise Cancelling Headphones - Triple Black",
"features": [
{
"Product Dimensions": "3 x 7.24 x 6 inches"
},
Bingo! How does it feel to be able to bypass any anti-bot detection measures? Awesome, right?
Conclusion
Using proxies with NodeJS libraries like Node-Fetch can prove invaluable for data extraction in 2024. We've explored different options and seen how free proxies fail even when rotated. But with premium solutions like ZenRows, you can bypass any anti-bot measures and retrieve the data you need. To try ZenRows, sign up to get your 1,000 free API credits.