Did you get blocked by a website while using cURL? One of the most effective techniques is to route your requests through a proxy server, making it more difficult to identify your traffic as non-human.
In this tutorial, you'll learn the step-by-step process of using a cURL proxy and the best practices and protocols to consider when web scraping. We'll cover the following key topics:
- What is a proxy in cURL?
- Proxy authentication with cURL.
- cURL best practices.
- Using a totating proxy with cURL.
- Proxies and protocols that work best for cURL web scraping.
Let's begin!
What Is a Proxy in cURL?
A cURL proxy is a server that acts as an intermediary between the client and the destination server. It allows you to access resources with increased anonymity and without network restrictions.
This functionality is particularly valuable for web scraping in cURL, as it helps bypass IP-based restrictions and reduces the likelihood of your scraper being detected or blocked.
Here's how it works:
- First, the client sends a request to the proxy server.
- Next, the proxy server forwards it to the destination server.
- The response from the destination server is returned to the proxy server.
- Finally, the proxy forwards the response to the client.
How to Use a Proxy With cURL?
In this section, we'll walk you through the process of installing cURL on different operating systems, explain the basic cURL syntax for proxy usage, show how to set up a proxy with a simple script and demonstrate how to extract data from your requests.
Install cURL
Before you can use a proxy with cURL, you need to have cURL installed on your system. Here's how to install it on different operating systems.
Windows:
Download the cURL executable from the official cURL website and add the cURL directory to your system's PATH.
Alternatively, you can use Windows Package Manager (winget):
winget install curl
macOS:
macOS usually comes with cURL pre-installed. If it's not, you can install it using Homebrew:
brew install curl
Linux:
Most Linux distributions come with cURL pre-installed. If not, you can install it using your distribution's package manager:
sudo apt-get update
sudo apt-get install curl
After installation, verify that cURL is installed correctly by running:
curl --version
This command would display the version information for cURL, confirming that it's installed and ready to use.
Understand cURL Syntax
Before we begin, it's relevant to point out the most important aspects of the syntax of cURL:
- PROXY_PROTOCOL: The internet protocol for the proxy server, such as HTTP and HTTPS.
- PROXY_IP_ADDRESS: The proxy server's hostname, IP address, or URL.
- PROXY_PORT: The port number provided for the proxy server.
- URL: The URL of the target website the proxy server will communicate with.
curl --proxy <PROXY_PROTOCOL>://<PROXY_IP_ADDRESS>:<PROXY_PORT> <URL>
Set Up a Proxy With cURL
Let's start by creating a basic script to make an HTTP request to HTTPBin. This service returns the IP address that made the request, which is useful for verifying our proxy setup.
Here's a basic cURL command without a proxy:
curl "https://httpbin.io/ip"
You'll get a similar output on running this command:
{
"origin": "198.51.100.42:49"
}
You'll get your actual IP address as the output.
Now, let's set up this script to use a proxy. We'll use a free proxy from the Free Proxy List website. Note the IP address and port from the website.
Replace the <PROXY_PROTOCOL>
, <PROXY_IP_ADDRESS>
, and <PROXY_PORT>
placeholders with the values you got from the Free Proxy List. Also, replace <URL>
with the URL of the HTTPBin target website.
curl --proxy "http://47.90.205.231:33333" "https://httpbin.io/ip"
Run the command. You'll get the following output:
{
"origin": "47.90.205.231:33333"
}
Congrats! The IP address shown in the output matched the proxy IP.
Free proxies are often unreliable and may not work at the time you're reading this. They're suitable for learning purposes, but not for production use. If the above proxy doesn't work, try grabbing a new one from the Free Proxy List and updating your command accordingly.
Extract Data With cURL Proxy
Consider the above cURL proxy example that delivered a JSON object with an origin
field. To extract the value of that field, use jq
along with the previous command.
curl -x "http://47.90.205.231:33333" "https://httpbin.org/ip" | jq ".origin"
Ensure jq is installed on your machine before running it.
The output is the actual value of the origin field, which is the IP address returned in the response.
"47.90.205.231"
Proxy Authentication With cURL: Username & Password
Some proxy servers have security measures in place to prevent unauthorized access and require a username and password to access the proxy.
cURL supports proxy authentication, allowing web scrapers to access these proxy servers while still respecting their security measures.
Here's how to connect to a URL using cURL with an authenticated proxy.
To begin, use the '-- proxy-user' option to provide the username and password for the proxy server.
For example, let's say you want to connect to a proxy server at http://<PROXY_IP_ADDRESS>:<PROXY_PORT>
that requires authentication with the username <YOUR_USERNAME>
and the password <YOUR_PASSWORD>
. The CLI command that performs the operation is as follows:
curl --proxy "http://<PROXY_IP_ADDRESS>:<PROXY_PORT>" --proxy-user <YOUR_USERNAME>:<YOUR_PASSWORD> "http://target-url.com/api"
This command will use the provided username and password for authentication to send the HTTP request to the target URL via the specified proxy.
Also, you need to include a Proxy-Authorization header in your request header. The --proxy-header
option in cURL allows you to do that, as shown below:
curl --proxy "http://<PROXY_IP_ADDRESS>:<PROXY_PORT>" --proxy-user <YOUR_USERNAME>:<YOUR_PASSWORD> --proxy-header "Proxy-Authorization: Basic dXNlcjEyMzpwYXNzMTIz" "http://target-url.com/api"
cURL Best Practices
Optimizing your cURL proxy usage involves several key steps. We'll explore environment variables, overriding proxies, creating aliases, and leveraging the .curlrc
file. These methods will help streamline your workflow and improve proxy management.
Environment Variables for a cURL Proxy
Environment variables are important for a cURL proxy because they allow you to set proxy server URLs, usernames, and passwords as variables that can be accessed by cURL commands instead of manually entering the values each time. That saves time and effort and makes managing multiple proxies for different tasks easier.
To use cURL proxy environment variables, follow these steps:
In your Terminal, set the proxy server URL, username, and password as environment variables using the export
command. Replace <YOUR_USERNAME>
and <YOUR_PASSWORD>
with the appropriate values for your proxy server. You can omit the username and password from the URL if it doesn't require authentication.
export http_proxy=http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
export https_proxy=https://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
If you're using a Windows OS, run this alternative command:
set http_proxy=http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
set https_proxy=https://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
Next, use the environment variables in your cURL commands by referencing them with the $ symbol.
curl -x $http_proxy https://httpbin.io/ip
Ignore or Override Proxy for a Request
When working with cURL and proxies, you may sometimes need to bypass or override proxy settings. This can be useful for testing, accessing local resources, or when you need to use a different proxy for specific requests.
To override the default proxy settings for a single request, you can use the -x
or --proxy
option:
curl -x "http://new-proxy.example.com:8080" "https://www.example.com"
This command will use the specified proxy for this request, regardless of any system-wide or environment variable proxy settings.
If you need to bypass proxy settings entirely for certain hosts or domains, you can use the --noproxy
option. This is particularly useful when you develop scrapers locally and want to ensure they work correctly without a proxy.
For example, to ignore proxies for requests to httpbin.io:
curl --noproxy "httpbin.io" "https://httpbin.io/ip"
The --noproxy
option tells cURL to access the specified domains directly without going through any proxy server.
It's important to note that if the --noproxy
option is not used and no proxy server is specified, cURL will attempt to use the system's default proxy settings. By using these options, you can fine-tune your cURL requests to use proxies selectively, giving you more control over your web scraping or API interaction processes.
Create an Alias
Aliases are important in cURL because they help simplify and streamline the process of making repeated or complex cURL requests. By setting up an alias, you can create a shortcut for a specific cURL command with certain options and parameters, making it easier to run the command again in the future without having to remember or retype all the details. That can save time and reduce the risk of errors.
Additionally, aliases can help make cURL commands more readable and easier to understand, especially for users who may be less familiar with the syntax or options available. To create an alias, you can use the alias
command in your terminal. For example, you can create an alias for ls -l
as ll
by running the command alias ll="ls -l
.
Here's how to automatically use the proxy server and credentials specified in your environment variable, saving you the trouble of typing out the full command each time.
Start by opening your shell's configuration file, such as .bashrc
or .zshrc
, using a text editor. This file is typically in the home/<YOUR_USERNAME>/
folder on Mac or C:\Users\<YOUR_USERNAME>
folder on Windows. You can also create the file in this folder if it doesn't exist.
The next step is to add the following snippet to the file to create an alias.
alias curlproxy='curl --proxy $http_proxy'
In this case, curlproxy
is the alias's name, and $http_proxy
used in the snippet above is the environment variable we created in the previous section. You can also customize the alias name to your preference.
Now, you can use the curlproxy
alias followed by the URL you want to connect to via the proxy. For example, to connect to "https://httpbin.io/ip" via the proxy, you can run the following command:
curlproxy "https://httpbin.io/ip"
Use a .curlrc File for a Better Proxy Setup
The .curlrc
file is a text file that contains one or more command-line options passed to cURL when you run a command. You can store your cURL settings, including proxy configuration, which makes it easier to manage your commands.
To use a .curlrc
file for cURL with proxy, here's what you must do:
- Create a new file called
.curlrc
in your home directory. - Add the following lines to the file to set your proxy server IP address, port, username, and password, then save it:
proxy = http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
Alternatively, you can use separate fields:
proxy-user = "<YOUR_USERNAME>:<YOUR_PASSWORD>"
proxy = "<PROXY_IP_ADDRESS>:<PROXY_PORT>"
- Run the default cURL command to connect to
https://httpbin.io/ip
via the proxy you have set up in the.curlrc
file:
curl "https://httpbin.io/ip"
This setup allows you to use your configured proxy settings without specifying them in each cURL command.
Use a Rotating Proxy With cURL
Single proxies, especially free ones, are often unsuitable for web scraping due to their unreliability and high likelihood of being blocked.
Proxy rotation offers a partial solution to these challenges. By cycling through multiple IP addresses for your requests, you significantly reduce the risk of being detected and blocked.
However, it's not just about rotation--the type of proxies matters too. Residential proxies, which use IP addresses assigned by Internet Service Providers to homeowners, are particularly effective. They seem like genuine user traffic to websites, making them less likely to be flagged as bot activity.
In this section, we'll explore two approaches to implementing proxy rotation with cURL: building a custom rotator using a list of free proxies and utilizing a premium proxy service for more reliable and efficient rotation.
While the first method is cost-free and educational, the second offers a more robust and sustainable solution for serious web scraping projects. Let's dive into both approaches to help you choose the best fit for your needs.
Rotate IPs With a Free Solution
In this example, we'll use the free provider to set up a rotating proxy with cURL.
To begin, go to Free Proxy List to get a list of free proxy IP addresses. Note the IP address, port, and authentication credentials (if any) for the rotating proxy you want to use.
Next, replace <YOUR_USERNAME>
, <YOUR_PASSWORD>
, <PROXY_IP_ADDRESS>
, and <PROXY_PORT>
with the values for your rotating proxy list and save them in the .curlrc
file you created earlier:
proxy = http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
proxy = http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
proxy = http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
Finally, test if the rotating proxy works by opening a Command Prompt and running the following command:
curl -v https://www.httpbin.io/ip
The output should display one of the IP addresses you saved in the .curlrc
file.
{"origin": "162.240.76.92"}
While this method can work for basic scraping tasks, it's important to note that free proxies are often unreliable and have limited effectiveness, especially when dealing with heavily protected websites. They may have slow speeds, frequent downtime, and a higher chance of being detected and blocked.
Premium proxy solutions are recommended for more reliable web scraping, especially on a larger scale or for accessing sites with strong anti-bot measures. In the next section, we'll explore how to implement a premium proxy solution for more dependable and efficient IP rotation.
Premium Proxy to Avoid Getting Blocked
While a free rotating proxy solution can be an effective way to scrape websites without being detected, it may not always be reliable. If you require more stability and faster connection speeds, a premium proxy service may be a better option to avoid anti-bots like Cloudflare with cURL.
One of the most effective proxy service providers is ZenRows. It offers auto-rotating premium proxies specifically tailored for web scraping and crawling. ZenRows Proxy Rotator comes with advanced features such as flexible geo-targeting, anti-bot and CAPTCHA auto-bypass, and more, all under the same price cap.
Let's see how to use ZenRows' premium proxies with cURL.
Sign up and go to the ZenRows Proxy Generator. Your premium residential proxy will be generated automatically. You can further customize it according to your requirements.
Once you're done, copy the generated cURL command.
curl -x http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1337 -L https://httpbin.io/ip
Run this command in your terminal. Here's an example of what the output would look like:
{
"origin": "185.220.101.34"
}
This output shows that your request was successfully routed through one of ZenRows' premium proxies. The IP address you see will be different from your actual IP, confirming that the proxy is working correctly.
By using ZenRows' premium proxy service, you can significantly improve the reliability and effectiveness of your web scraping operations, especially when dealing with websites with sophisticated anti-bot measures.
Proxies and Protocols That Work Best for cURL Web Scraping
The choice of a cURL proxy protocol and proxy type can significantly impact your network communication's performance and reliability.
Let's look at the most efficient options.
Best cURL Proxy Types
Here are some popular proxies for cURL web scraping:
- Residential: These proxies use IP addresses associated with real residential locations. That makes them less likely to be detected and blocked by anti-bot systems.
- Datacenter: These are proxy servers hosted in data centers. They offer high speeds and can handle large volumes of requests quickly.
- 4G proxy: A mobile proxy server that routes internet traffic through a 4G LTE connection. They're typically more expensive than datacenter proxies but offer higher anonymity and better reliability.
Learn more about the different types of web scraping proxies from our detailed tutorial.
Protocols
Now, let's see the most popular protocols that cURL supports:
- HTTP: Hypertext Transfer Protocol, the foundation of data communication on the web.
- HTTPS: HTTP with an added layer of security through encryption (SSL/TLS).
- FTP: File Transfer Protocol, used for transferring files between servers and clients over the internet.
- FTPS: FTP with an added layer of security through encryption (SSL/TLS).
- SOCKS: A versatile protocol that tunnels network traffic at a lower level than HTTP. It can be configured to route all kinds of internet traffic, making it flexible for different networking needs, including web scraping.
- LDAP: Lightweight Directory Access Protocol, an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network.
- LDAPS: LDAP with an added layer of security through encryption (SSL/TLS).
HTTP, HTTPS, and SOCKS are the most relevant protocols used in web scraping to enable communication between clients and servers.
We recommend using HTTPS proxies for web scraping. HTTPS provides an encrypted connection, adding an extra security layer to your scraping activities. This encryption helps protect the data you're sending and receiving, making it more difficult for third parties to intercept or tamper with your requests.
Additionally, many modern websites default to HTTPS, so using HTTPS proxies ensures compatibility and reduces the risk of connection errors during your scraping operations.
Conclusion
Using a cURL proxy can greatly enhance your web scraping capabilities. It allows you to avoid IP blocks and access geographically restricted content.
However, free proxies aren't reliable, so consider a premium proxy provider. ZenRows offers auto-rotating premium proxies specifically designed for web scraping and crawling. It includes advanced features such as flexible geo-targeting, anti-bot and CAPTCHA auto-bypass, and easy integration with various web scraping tools.
Get started with ZenRows today and access the data you need!
Frequent Questions
What Is cURL?
cURL (Client URL) is a command-line tool for transferring data using various protocols. It's commonly used to make HTTP requests, download files, and interact with APIs. Here's a simple example of a cURL command:
curl https://example.com/
This command sends a GET request to the specified URL and displays the response in the terminal.
How to Set a Proxy in the cURL Command?
To set a proxy in a cURL command, use the -x
or --proxy
option followed by the proxy server URL. For example, curl -x http://proxy-url.com:8080 https://target-url.com
will use the HTTP proxy server at http://proxy-url.com:8080
to access <https://target-url.com>
.
What Is the Default Proxy Port for cURL?
The default proxy port for cURL is 1080. However, this can vary depending on the proxy server used. It's always recommended to check with the proxy provider, but it'll default to port 1080 if no port is specified.
How Do I Know If cURL Is Using a Proxy?
You can check if cURL is using a proxy with the -v
option in your cURL command. It'll display the verbose output where you can see the request's details. You'll see the proxy server and port number listed in the output if a proxy is used.
How to Bypass a Proxy in cURL Command?
To bypass a proxy in a cURL command, use the --noproxy
option, followed by a comma-separated list of hosts or domains you want to exclude from the proxy. For example, curl --noproxy proxy1.com,proxy2.net https://www.target-url.com
will bypass the proxy for requests to proxy1.com
and proxy2.net
, but not for others.
How Do I Make cURL Ignore the Proxy?
To make cURL ignore a proxy, you have several options. You can use the --noproxy '*'
option to bypass all proxies, unset proxy-related environment variables (like http_proxy
or https_proxy
), override proxy settings in your cURL command with -x ""
, or use the --proxy1.0 ''
option. Any of these methods will cause cURL to make direct connections to the target URL, ignoring any configured proxy servers.