A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Features | ||||||||||||||||||||||||||
2 | # | Name | URL | Price | Decision | Notes | Column 15 | Crawling | Type | Scraping | Executes JS | Interface/Language | Schedulable | IP Proxies | Anti-blocking | ||||||||||||
3 | 1 | Diffbot | https://www.diffbot.com/ | Free Plan up to 10k pages; $299/month for 250k pages | Worth checking out. | Seems like one of the top products in the crawling and parsing space. Crawling not supported until the Plus plan which costs $899 per month. Provides nice JSON payloads for product pages but requires us to handle all site crawling, which significantly limits the value of this product. | Yes | SaaS | Yes | HTTP API | Partial on non-enterprise plans | ||||||||||||||||
4 | 2 | Apify | https://apify.com/ | Free; $49/month for Starter | Worth checking out. | Sounds like more than we need. Well rated on G2. The Actor templates made it extremely easy to setup a site crawler and get started. Feels like this would be extremely expensive at scale to use. Cost $0.286 to scrape just 100 items from the Midland product catalog. Took ~10 minutes to scrape just 100 items from the Midland product catalog, but I believe this was more due to the slow response times of Midland's site. | SaaS | Web UI | Yes | Yes | |||||||||||||||||
5 | 3 | Browse AI | https://www.browse.ai/ | Free trial; Starter plan is $20/month | Worth checking out. | Really slick and easy to use "Robot Training" workflow. Does a great job of making it easy to setup a basic scraper and see the data it will produce quickly and easily. No built-in crawling support. Recommends work arounds in https://help.browse.ai/article/192-one-robot-to-extract-them-all, but they all involve directly providing/configuring the set of URLs to be scraped. Traversing a large product catalog automatically is a critical requirement for our use case. Seems like a deal breaker. Also seems expensive. | SaaS | Yes | Yes | Web UI/API | Yes | Yes | |||||||||||||||
6 | 4 | Scraping Bee | https://www.scrapingbee.com/ | $49/month for 30k to 150k pages | Maybe worth a look if other options fall through | Full featured and a reasonable price. No free trial/free tier. | Yes | SaaS | Yes | Yes | Web UI/API | Yes | |||||||||||||||
7 | 5 | Web Scraper | https://webscraper.io/ | Free browser extension; $50/month for cloud execution | Maybe worth a look if other options fall through | Simple to use but limited functionality. | Chrome Extension | Yes | |||||||||||||||||||
8 | 6 | Crawlee | https://crawlee.dev/ | Free; Open Source | Worth consideration if building a JS app is not a non-starter. | Can be built and deployed to Apify | Yes | Yes | Supported | Yes | |||||||||||||||||
9 | 7 | Cherrio.js | https://cheerio.js.org/ | Free; Open Source | Worth consideration if building a JS app is not a non-starter. | Does HTML parsing and data extraction very well. Doesn't do much else. | No | Library | Yes | No | Node.js | No | No | ||||||||||||||
10 | 8 | OxyLabs | https://oxylabs.io/products/scraper-api/web | $75/month for 5 GB | Worth checking out if blocked sites becomes an issue for us. | Provides IP addresses and other utilities to avoid blocked scraping along with their scraping support. | SaaS | Yes | Yes | Web UI | Yes | ||||||||||||||||
11 | 9 | Import.io | https://www.import.io/ | $399/month for Starter | No; Too expensive | Point-n-click interface. Sounds like its UI is not as good as its competitors (e.g. Apify). | Yes | SaaS | Yes | No/Not Well | Web UI | ||||||||||||||||
12 | 10 | Parsehub | https://www.parsehub.com/ | Limited free trial; Paid plans starting at $189/month | No; Too expensive | SaaS | Yes | Yes | Web UI/API | ||||||||||||||||||
13 | 11 | Clay.com | https://www.clay.com/ | Free trial; Starter plan is $134/month | No; Too expensive | Focused on non-technical users (point-n-click interface). | SaaS | Web UI | Yes | ||||||||||||||||||
14 | 12 | Scrape Pros | https://scrapingpros.com/ | $450/month with limited features | No; Too expensive | ||||||||||||||||||||||
15 | 13 | NetNut | https://netnut.io/ | $300/month for 20 GB | No; Too expensive | Focused on seach engine and social media crawling. | |||||||||||||||||||||
16 | 14 | Octoparse | https://www.octoparse.com/ | Free plan; Then $99/month | No; Too expensive | Focused on non-technical users (point-n-click interface). | Yes | Yes | Yes | Yes | |||||||||||||||||
17 | 15 | Bright Data | https://brightdata.com/ | $10/month for micro package | No; Too limited focus | Focused on providing IP addresses and other utilities to avoid blocked scraping. | Yes | Yes | |||||||||||||||||||
18 | 16 | ScrapeHero Cloud | https://www.scrapehero.com/marketplace/ | $200 per month per site according to https://www.scrapehero.com/pricing/ | No; Not flexible enough | Offers a collection of out of the box crawlers. | Yes | SaaS | Yes | Yes | Web UI/API | Yes | Yes | ||||||||||||||
19 | 17 | Selenium | https://medium.com/@datajournal/web-scraping-with-selenium-955fbaae3421 | Free; Open Source | No; Only crawling/scraping HTML | Web automation library. Support for multiple languages. Steep learning curve. | Library | Yes | Multiple | ||||||||||||||||||
20 | 18 | Puppeter | https://pptr.dev/ | Free; Open Source | No; Only crawling/scraping HTML | Node.js library. Web automation library. | Library | Yes | Node.js | ||||||||||||||||||
21 | 19 | Playwright | https://playwright.dev/ | Free; Open Source | No; Only crawling/scraping HTML | Web automation library. | Library | Node.js | |||||||||||||||||||
22 | 20 | MechanicalSoup | https://mechanicalsoup.readthedocs.io/en/stable/ | Free; Open Source | No; Only crawling/scraping HTML | Written in Python. Web automtion library. | Library | No | No | Python | |||||||||||||||||
23 | 21 | Moenda | https://www.mozenda.com/ | Limited free trial; Paid plans | No; Prices not listed | Yes | SaaS | Yes | Web UI | ||||||||||||||||||
24 | 22 | Scrapy | https://scrapy.org/ | Free; Open Source | No | Written in Python. | Library | Yes | No | Python | |||||||||||||||||
25 | 23 | Pyspider | https://github.com/binux/pyspider | Free; Open Source | No | Library | Python | ||||||||||||||||||||
26 | 24 | Beautiful Soup | https://realpython.com/beautiful-soup-web-scraper-python/ | Free; Open Source | No | Python library. Focuses exclusively on HTML parsing. | No | Library | Python | ||||||||||||||||||
27 | 25 | Apache Nutch | https://nutch.apache.org/ | Free; Open Source | No | Written in Java. Steep learning curve. | No | Library | Java | ||||||||||||||||||
28 | 26 | Hertrix | https://github.com/internetarchive/heritrix3 | Free; Open Source | No | Written in Java. | Yes | Library | Java | ||||||||||||||||||
29 | 27 | Web Harvest | https://github.com/janih/web-harvest | Free; Open Source | No | Looks antiquated. | Java | ||||||||||||||||||||
30 | 28 | Web Magic | https://webmagic.io/en/ | Free; Open Source | No | Yes | Java | ||||||||||||||||||||
31 | 29 | Commn Crawl | https://commoncrawl.org/ | Free | No | A collection of pre-scraped sites made available for free. Looks like there is some data included in this collection for both https://www.worldwidefittings.com/ and https://midlandindustries.com/, but getting access to the HTML data from these sites is not a problem we need assistance with. | No | Dataset | No | No | HTTP API | No | No | No | |||||||||||||
32 | 30 | Web Robots | https://webrobots.io/ | $99/month/source; Free browser extension | No; Looks half-baked | Yes | SaaS | Web UI & JS | |||||||||||||||||||
33 | 31 | Priceva | https://priceva.com/ | Free; $99/month for more features | No; Only price tracking | Focused specifically on price tracking. | |||||||||||||||||||||
34 | 32 | Scrapebox | https://www.scrapebox.com/ | One time purchase | No; Too SEO focused | Focses on SEO related tasks. | Desktop App | Yes w/ added cost | |||||||||||||||||||
35 | 33 | ScreamingFrog | https://screamingfrog.co.uk/ | Yearly subscription | No; Too SEO focused | Focses on SEO related tasks. | Desktop App | ||||||||||||||||||||
36 | 34 | Web Content Extractor | https://www.webcontentextractor.com/ | One time purchase | No; Too antiquated | Desktop App | Yes w/ added cost |