How to Use ScrapydWeb for Managing Scrapy Projects

Idowu Omisola
Idowu Omisola
June 19, 2025 · 7 min read

Managing multiple Scrapy projects and spiders over the command line interface can be challenging, especially if you've deployed them over Scrapyd servers. ScrapydWeb solves this challenge by providing a web interface to manage Scrapy projects via Scrapyd API endpoints.

In this tutorial, you'll learn how to manage Scrapy projects over Scrapyd clusters using the ScrapydWeb user interface.

What Is ScrapydWeb and Why Use It?

ScrapydWeb provides a web-based interface for managing Scrapyd clusters, a set of servers for deploying Scrapy projects. Since it reads Scrapy scraping task information directly from Scrapyd, ScrapydWeb requires a running Scrapyd server to monitor and control jobs.

Featured
Scrapyd: Step-by-Step Tutorial [2025]
Learn to deploy and manage Scrapy spiders with ease using Scrapyd in our comprehensive step-by-step tutorial.

ScrapydWeb supports real-time monitoring and multi-server management, and lets you view scraping job statistics. Overall, its UI provides access to Scrapyd's spider management API endpoints. You can execute, schedule, and cancel scraping jobs and even delete them via the ScrapydWeb interface.

Key Features of ScrapydWeb

The major features of ScrapydWeb include:

  • Scheduled scraping jobs: It lets you schedule your spiders to run at a specific time and frequency.
  • Multi-node Scrapyd cluster management: You can manage multiple Scrapyd servers within a single ScrapydWeb user interface.
  • Mobile mode: ScrapydWeb lets you manage Scrapyd servers directly from your mobile device.
  • Task monitoring: You can monitor the Scrapyd server status and real-time job performance from a single interface.
  • Detailed job statistics and logs: ScrapydWeb lets you access detailed statistics, logs, and progress visualizations at the server, project, and task levels.
  • Alerts: It provides detailed logs, job status, and statistics alerts via Email, Slack, or Telegram.
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Setting Up ScrapydWeb

Let's now see how to set up ScrapydWeb, including the installation, configuration, and deployment steps.

Step 1: Installing Scrapyd and ScrapydWeb

ScrapydWeb requires Python 3+ to work smoothly. We recommend installing Python's latest version from the official download page before you begin.

You'll also need to install Scrapyd, Scrapyd-Client, and ScrapydWeb. Keep in mind that Scrapyd-Client is a command-line interface (CLI) tool that lets you communicate with the Scrapyd API.

Install these packages using pip:

Terminal
pip3 install scrapyd scrapyd-client scrapydweb

All done? You'll go through the deployment process in the next section.

Step 2: Deploy Scrapy Project to Scrapyd

Next, start the Scrapyd server with the following command:

Terminal
scrapyd

The above command starts a Scrapyd server on port 6800 by default:

Example
Site starting on 6800

Next, connect your Scrapy project with the Scrapyd server.

Go to the scrapy.cfg file in your Scrapy project root folder. Replace its content with the following code. In the configuration below, the deployment location has been set to localhost. It also points to the current Scrapyd server URL to ensure the project deploys correctly to the specified running server.

scrapy.cfg
[settings]
default = product_scraper.settings

[deploy:local]
url = http://localhost:6800/
project = scraper

Next, deploy the project using the Scrapyd-Client by running the following command. The <target_name> is the current deployment environment (local, in this case) and <your_project_name> is your Scrapy project name (e.g., scraper):

Example
scrapyd-deploy <target_name> -p <your_project_name>

For instance, the following command deploys a Scrapy project (scraper) locally on the running Scrapyd node:

Terminal
scrapyd-deploy local -p scraper

After deployment, visit the running Scrapyd node on http://localhost:6800/ via your browser, and you'll see the deployed scraper project listed under "Available projects":

Scrapyd Local Home
Click to open the image in full screen

You're now ready to monitor the deployed Scrapy project via ScrapydWeb. You'll do that in the next section.

Step 3: Manage Scrapyd Servers via ScrapydWeb

Start the ScrapydWeb server by running the following command:

Terminal
scrapydweb

The above command creates a scrapydweb_settings_v11.py file in your project root.

Open the created scrapydweb_settings_v11.py file and scroll to the SCRAPYD_SERVERS list. You'll see the servers being managed by ScrapydWeb listed in this section, including an authenticated default one. Feel free to comment out the authenticated default port, leaving the required port localhost:6800:

scrapydweb_settings_v11.py
# ...
SCRAPYD_SERVERS = [
    "127.0.0.1:6800",
    # ... comment or remove the default port
]
# ...

The advantage of this file is that you can add more Scrapyd servers to the SCRAPYD_SERVERS list as you scale to other nodes. More on that later.

Rerun the scrapydweb command to start the ScrapydWeb server. This command starts a ScrapydWeb daemon that defaults to http://localhost:5000/.

Visit that URL via your browser, and you'll get the following interface, listing your running Scrapyd server by default:

Click to open the image in full screen

Bravo! You've now connected your Scrapyd server with the ScrapydWeb management interface.

If you start a second Scrapyd server, it will also appear on the Scrapyd server table.

Managing Multiple Scrapyd Servers on ScrapydWeb

Assuming you want to manage another Scrapy project called product_scraper on a different Scrapyd server, you'll need to create a new Scrapyd daemon for it on a separate port.

To start another Scrapyd server on a different port, open the second Scrapy project you want to manage and create a scrapyd.conf file in its root directory. Specify the new port in this directory, as shown below. The configuration below file tells Scrapyd to run on an alternative port (6802) instead of the previous 6800:

scrapyd.conf
[scrapyd]
http_port = 6802

Then, ensure you point the new Scrapy project to this new Scrapyd port by modifying its scrapy.cfg file:

scrapy.cfg
[settings]
default = product_scraper.settings

[deploy:local]
url = http://localhost:6802/
project = product_scraper

Open your command line to the new project's root folder and run the scrapyd command to start the Scrapyd daemon on the new port (6802):

Next, open another command line to your project folder and run the deployment command for the new project:

Terminal
scrapyd-deploy local -p product_scraper

The above command deploys the new Scrapy project to the new Scrapyd server. You can visit http://localhost:6802 to see the deployed project.

Stop the running ScrapydWeb server. Then, open the scrapydweb_settings_v11.py file in the first Scrapy project (scraper) and update the SCRAPYD_SERVERS list with the new Scrapyd server connected to the new Scrapy project. The SCRAPYD_SERVERS list becomes:

scrapydweb_settings_v11.py
# ...
SCRAPYD_SERVERS = [
    "127.0.0.1:6800",
    "127.0.0.1:6802",
]
# ...

Open the ScrapydWeb URL again, and you'll see the second Scrapyd server:

Click to open the image in full screen

From here, you can switch between Scrapyd servers and run and schedule spiders.

You'll learn how to run tasks in the next step.

Step 4: Run Individual Spiders

To run a spider within a specific Scrapyd cluster:

  • Open the ScrapydWeb interface (http://localhost:5000/) and click "Servers" on the left sidebar.
  • Select the Scrapyd server containing the Scrapy project with the desired spider.
  • Select the "Run Spider" tab and click Multinode Runspider.
Click to open the image in full screen
  • Select the desired Scrapyd server, Scrapy project, version number (go with the default option for auto-versioning), and the spider you wish to run.
Click to open the image in full screen
  • To add more functionality, such as setting the User Agent, cookies, robots.txt rule, concurrency, and delay, toggle on the "settings & arguments" switch and set your preferences.
Click to open the image in full screen
  • Scroll down and click "Check CMD." Then, click "Run Spider" to execute the spider immediately.
Click to open the image in full screen
  • You'll now see the executed job with an "ok" status on the next page.
Click to open the image in full screen

You just executed your first scraping job on ScrapydWeb. That's great!

Let's see other ScrapydWeb features.

Other ScrapydWeb Management Features

As mentioned earlier, ScrapydWeb supports other Scrapyd server management features, including scheduling, stats, logs, and more. Let's see how scheduling and logging work.

Schedule Spiders With ScrapydWeb

Here's how to schedule a scraping job in ScrapydWeb:

  • Go to "Servers" and select the Scrapyd server on which you want to schedule a spider.
  • Click the "Run Spider" tab ⇒ "Multinode Run Spider".
  • Follow the previous steps for selecting the target Scrapyd server, Scrapy project, version and spider.
Click to open the image in full screen
  • Toggle on the "timer task" switch and set your timer preferences.
Click to open the image in full screen
  • For more schedule options, toggle on "show more timer settings". For instance, use the "start_date" and "end_date" options to schedule the selected spider to run at intervals.
Click to open the image in full screen
  • After setting your preferences, click "Check CMD" > "Add Task" to conclude the schedule.
Click to open the image in full screen
  • This takes you to a dashboard showing current and past schedules for the selected Scrapyd server.
Click to open the image in full screen

Well done! You now know how to schedule Scrapy scraping jobs with ScrapydWeb.

Depending on your project's requirements, you can schedule multiple spiders within the same Scrapyd server or across many Scrapyd servers.

View Spider Execution Logs and Stats

Logs and stats give you an overview of failing and successful spider runs, including the point of failure. This allows you to prioritize reruns and track update progress for scheduled tasks.

You can view spider execution logs and stats across Scrapyd clusters.

Here's how to go about it:

  • Select the desired server from the server option dropdown at the top left.
  • Click "Logs" on the sidebar.
  • Select the project name from the logs table.
Click to open the image in full screen
  • Then, select the spider you want to view.
Click to open the image in full screen
  • On the next page, you'll see the log list with timestamps for each. Click "Log" to view the spider log or "Stats" to see the crawling statistics.
Click to open the image in full screen

Great job! You've advanced your spider monitoring skills on ScrapydWeb.

That said, despite the scaling infrastructure that ScrapydWeb offers, your Scrapy projects can still face the problem of getting blocked by anti-bot measures. There's a solution for that in the next part of this article.

Scale Up With ZenRows

Although ScrapydWeb allows you to schedule multiple batch scraping jobs and scale across several nodes, these jobs often fail due to anti-bot protections. When that happens, you risk losing money and wasting valuable time and human effort.

The best way to prevent getting blocked by anti-bots is to use a scraping solution like the ZenRows Universal Scraper API. ZenRows matches scalability with an impressive scraping success rate of up to 99.93%, ensuring you extract data without limitations. It also has headless browser features to automate human interactions and scrape dynamic content.

ZenRows integrates easily with Scrapy via the scrapy-zenrows middleware. This middleware brings all the functionality of the Universal Scraper API to Scrapy.

Let's see how it works by scraping a heavily protected website like the Anti-bot Challenge page.

Sign up and go to the Request Builder. Then, copy your ZenRows API key.

building a scraper with zenrows
Click to open the image in full screen

Install the scrapy-zenrows middleware with pip:

Terminal
pip3 install scrapy-zenrows  

Add the middleware and your ZenRows API key to your Scrapy project's settings.py file and set ROBOTSTXT_OBEY to False to gain access to ZenRows' API:

settings.py
# ...
ROBOTSTXT_OBEY = False

DOWNLOADER_MIDDLEWARES = {
	# enable scrapy-zenrows middleware
	"scrapy_zenrows.middleware.ZenRowsMiddleware": 543,
}
# ZenRows API key
ZENROWS_API_KEY = "<YOUR_ZENROWS_API_KEY>"

Import ZenRowsRequest into your scraper spider and add ZenRows' params to your start_requests function, including the JS Rendering and Premium Proxy features:

Example
# pip3 install scrapy-zenrows
import scrapy
from scrapy_zenrows import ZenRowsRequest

class Scraper(scrapy.Spider):
    name = "scraper"
    allowed_domains = ["www.scrapingcourse.com"]
    start_urls = ["https://www.scrapingcourse.com/antibot-challenge"]

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                params={
                    "js_render": "true",
                    "premium_proxy": "true",
                },
                callback=self.parse,
            )

    def parse(self, response):
        self.log(response.text)

The above Scrapy spider outputs the protected website's full-page HTML, showing you bypassed the anti-bot challenge:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! You just bypassed an anti-bot challenge using the scrapy-zenrows middleware. You can now reliably schedule scraping jobs at scale via ScrapydWeb.

Conclusion

ScrapydWeb is a valuable tool for monitoring your Scrapy projects at scale, providing a friendly user interface for running, scheduling, canceling, deleting scraping jobs, viewing logs and stats, and more.

Despite the powerful features of ScrapydWeb, anti-bots can disrupt your spider schedules, causing abrupt scraper failures that result in low or zero data yields. We recommend integrating ZenRows with your Scrapy project upfront to mitigate these challenges. ZenRows lets you scrape at any scale without limitations.

Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you