|Feature||Explanation||Does OnCrawl support a feature?|
|Basic SEO reports|
|List of indexable/non-indexable pages|
It's necessary to view a list of indexable / non indexable pages to make sure there are no mistakes. Maybe some URLs were intended to be indexable?
|Yes. Go to "Indexability -> What is indexable". Alternatively, you can go to "Tools -> Data Explorer and add a new filter: "Is indexable". You can easily filter the results by segments|
|Missing title tags|
Meta titles are an important part of SEO audits. A crawler should show you a list of pages that have missing tags.
|Yes. Go to HTML tags -> SEO tags -> and click on "Title: not set"|
|Filtering URLs by status code (3xx, 4xx, 5xx)|
When you perform an SEO audit, it's necessary to filter URL by status code. How many URLs are not found (404)? How many URLs are redirected (301)?
|Yes. Go to Indexability -> Status Codes|
|List of Hx tags|
“Google looks at the Hx headers to understand the structure of the text on a page better.” - John Mueller
|Yes, go to HTML tags -> SEO tags. There is an entire section about the Hx tags|
|View internal nofollow links|
It's nice to see internal nofollow list to make sure there aren’t any mistakes.
|Yes. Go to "Links flow" -> "Links" and click on "Internal nofollow" on the "Links breakdown"|
|External links list (outbound external)|
A crawler should allow you to analyze both internal and external outbound links.
|Yes. Go to "Links flow" -> "Links" and click either on "External nofollow" or "External dofollow" on the "Links breakdown" section|
|Link rel="next" (to indicate a pagination series)|
When you perform an SEO audit, you should analyze if the pagination series are implemented properly.
|Yes. Go to Tools -> Data explorer and add a new filter: "Rel next has any value"|
Hreflang tags are the foundations of international SEO, so a crawler should recognize them to let you point to hreflang-related issues.
|Yes. Go to Tools -> Data Explorer and add two additional columns to the report: "Hreflang langs" and "Hreflangs hrefs"|
|Canonical tags||Every SEO crawler should inform you about canonical tags to let you spot indexing issues.||Yes. Go to Summary -> Canonicalized pages.|
|Information about crawl depth - number of clicks from a homepage|
Additional information about crawl depth can give you an overview of the structure of your website. If an important page isn’t accessible within a few clicks from a homepage, it may indicate poor website structure.
|Yes. Go to tools -> Data explorer -> Add a new column: "Depth"|
|List of empty / Thin pages|
A large number of thin pages can negatively affect your SEO efforts. A crawler should report them.
|Yes. Go to `Content` -> `Pages with thin content`|
|Duplicate content recognision|
A crawler should give you at least basic information on duplicates across your website.
|Yes. Oncrawl: We use the simhash method that we tweaked (explanation here: https://www.oncrawl.com/oncrawl-seo-thoughts/spot-near-duplicates-improve-seo/). We compute a fingerprint for each page based on our N-grams analysis. This way we spot navigation items and we focus our fingerprinting on the main content. Then we compare each fingerprint to group pages with similar content. At the end of the process we apply the Damerau Lenvenstein distance https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance to evaluate similarity. That's what makes Oncrawl unique in terms of content analysis and actionnable insights for your Near duplicates.|
|A detailed report for given URL|
It's must-have! If you do a crawl of a website, you may want to see internal links pointing to a particular URL, to see headers, canonical tags, etc.
Advanced URL filtering for reporting - using regular expressions and modifiers like "contains," "start with,” "end with."
I can't imagine my SEO life without a feature like this. It’s common that I need to see only URLs that end with “.html” or those which contain a product ID. A crawler must allow for such filtering.
|Yes, but it doesn't offer support for regular expressions|
|Adding additional columns to a report
This is also a very important feature of crawlers. I simply can't live without it. When I view a single report, I want to add additional columns to get the most out of the data. Fortunately, most crawlers allow this.
Some crawlers offer the possibility to categorize crawled pages (i.e. blog, product pages etc) and see some reports dedicated to specific categories of pages.
|Filtering URLs by type (HTML, CSS, JS, PDF etc)
Crawlers visit resources of various types (HTML, PDF, JPG). But, usually you want to review only HTML files. A crawler should support this.
|Basic statistics about website structure - ie. Depth stats,||Yes. Review the Crawl report -> Summary section. Basic information can be displayed by segments.|
|Overview - the list of all the issues listed on a single dashboard
It's a positive if a crawler lists all the detected issues on a single dashboard. Of course, it will not do the job for you, but it can make SEO audits easier and more efficient.
|No, but OnCrawl has a nice custom dashboard builder.|
|Comparing to the previous crawl|
When you work on a website for a long time, it’s important to compare the crawls that were done before and after the changes.
|Yes. To do so, you have to set the option "Crawl over crawl" in the crawl settings|
|List mode - crawl just the listed URLs (helpful for a website migration)|
Sometimes you want to perform a quick audit of a specified set of URLs without crawling the whole website.
|Changing the user agent|
Sometimes, it's necessary to change the user agent. For example, even when a website blocks Ahrefs, you still need to perform a crawl. Also, more and more websites detect Googlebot by user agent and serve it a pre-rendered version instead of fully equipped JS.
|Yes. but it's not fully customizable (you can change only a part of the user agent string). However, if you need to fully customize the user agent, you can contact OnCrawl, they will set it for you|
|Crawl speed adjusting |
You should be able to set a crawl speed i.e 1-3 URLs per second if a website can't handle host load, while you may want to crawl much faster if a website is healthy.
|Yes; you need to connect OnCrawl to your Google Search Console to allow for speed adjusting or ask the Customer Success team to activate it. They do this kind of verification to make sure you don't DDoS your competitors ;-)|
|Can I limit crawling? Crawl depth, max number of URLs|
Many websites have millions of URLs. Sometimes, it's good to limit the crawl depth or specifying a max number of URLs allowed to crawl.
|Analyzing a domain protected by an htaccess Login|
(helpful for analyzing staging websites)
This is a helpful feature if you want to crawl the staging website.
|Can I exclude particular subdomains, include only specific directories?||Yes. You can include/exclude specific directories by setting virtual robots.txt|
|Universal crawl -> crawl + list mode + sitemap||It's not enabled by default, however before performing a crawl, you can contact OnCrawl and they will enable it|
It's handy to be able to schedule a crawl and set monthly/weekly crawls.
|Indicating the crawling progress|
If you deal with big websites, you should be able to see the current status of a crawl. Will you wait a few hours, or weeks till the 1kk+ crawl will finish?
Accidental changes in robots.txt can cause Google to not be able to read and index your content. It's beneficial if a crawler detects changes in Robots.txt and informs you.
|Crawl data retention|
It’s good if a crawler can store results for a long period of time.
|Yes, as long as you are subscribed. Crawls are archived after 3 months, but you can unarchive them within a few minutes.|
|Notifications - crawl finished|
A crawler should inform you when a crawl is done (desktop notification / email).
|Advanced SEO reports|
|List of pages with less than x links incoming|
If there are no internal links pointing to a page, it may mean for Google that the page is probably irrelevant. It’s crucial to spot orphan URLs.
|Yes. Go to Tools -> Data Explorer -> add a new filter: "No. inlinks less than x"|
|Comparison of URLs found in sitemaps and in crawl.||Sitemaps should contain all the valuable URLs. If some pages are not included in a sitemap, it can cause issues with crawling and indexing by Google. |
If a URL is apparent in a sitemap, but can't be accessible through crawl, it may be a signal to Google that a page is not relevant.
|Yes. Go to Crawl Report -> Indexability -> Sitemaps and Click on "Pages in structure not in sitemap"|
|Internal Page Rank value||Although any PageRank calculations can’t reflect Google’s link graph, it’s still a really important feature. Imagine you want to see the most important URLs based on links. Then you should sort URLs by not only simple metrics like number of inlinks, but also by internal PageRank. You think Google doesn’t use PageRank anymore? http://www.seobythesea.com/2018/04/pagerank-updated/||Yes. Go to Links flow -> Internal popularity.|
In mobile-first indexing it’s necessary to perform a content parity audit between the mobile and desktop versions of your website
|Yes, you can visit "Rel Alternate" tab to check AMP. Also, you can use mobile user agent|
|Additional SEO reports|
|Malformed URLs (https://https://, https://example.com/tag/someting/tag/tag/tag or https://www.example.com/first_part of URL||You can do it partially. Go to URL explorer and try the following filters: "URL contains space", "URL contains https://https".|
|List of URLs with parameters||Yes, Go to Tools -> Data explorer and add a new filter: Full URL contains `?`|
|Mixed content (some pages / resources are served via HTTPS, some by HTTPS)||?|
|Redirect chains report|
Nobody likes redirect chains. Not users, not search engines. A crawler should report any redirect chains to let you decide if it's worth fixing.
|It's not enabled by default, however before performing a crawl, you can contact OnCrawl and they will enable it for the crawl|
|Website speed statistics|
Performance is becoming more and more important both for users and SEO. So crawlers should present reports related to performance.
|Yes. Go to Tools -> Data explorer and add a new column "Load time". Alternatively, you can go to "Crawl Report" -> "Payload"|
|List of URLs blocked by robots.txt|
It happens that a webmaster mistakenly prevents Google from crawling a particular set of pages. As an SEO, you should review the list of URLs blocked by robots.txt - to make sure there are no mistakes.
|Yes. Go to Data explorer -> Add a new filter: Denied by robots.txt is: "true"|
|Exporting to excel / CSV|
Sometimes a crawler has no power here and you need to export the data and edit it in Excel / other tools.
|Yes, you can export the data to CSV|
|Exporting to PDF||Yes, you can export the entire crawl report or only particular sections|
|Custom reports / dashboards||Yes. Go to Tools -> dashboard builder|
|Sharing individual reports|
Imagine that you want to share a report related to 404s with your developers. Does the crawler support it?
|Yes (for now, it's available for ultimate and superior subscriptions, but Oncrawl is going to release this feature for all their users shortly).|
|Granting access to a crawl for another person|
It's pretty common that two or more people work on the same SEO audit. Thanks to report sharing, you can work simultaneously.
|Explanation on the issues|
If you are new to SEO, you will appreciate the explanation of the issues that many crawlers provide.
A crawler should let you perform a custom extraction to enrich your crawl. For instance, while auditing an e-commerce website, you should be able to scrape information about product availability and price.
|Yes. You can extract data using Regex/XPath sellectors|
|Can crawler detect the unique part - that is not a part of the template?||It’s valuable if a crawler let you analyse only the unique part of a page (excluding navigation links, sidebars and footer).||Yes, it's built-in into our Near Duplicates Detector.|
|Ability to use the crawler's API||Yes|
|Supported operating systems||All - it's web-based application|
|Integration with Google Analytics||Yes. OnCrawl: "Oncrawl Rankings combines your GSC data with your logs and crawl data. It literally opens the Google blackbox by pointing you to the factors influencing the most of your rankings. So now you know how to prioritise your actions and evaluate their ROI. Google behaviour may vary a lot from a topic or another, or depending on the query intent. We are working hard to help SEOs understand that and take better decisions"|
|Integration with Google Search Console||Yes. OnCrawl: "We compare your GSC data with Logs and Crawl data so that you can understand which ranking factor is really influencing the ranking of your pages. We allow you to build advanced queries into our Data Explorer like "give me all pages that rank on XXX with under 300 words and over page depth 7". |
You have a quick overview here: https://www.oncrawl.com/product-updates/introducing-search-console-integration-skyrocket-organic-search/
and a summary here: https://www.oncrawl.com/google-search-console-oncrawl/
The coolest thing is the CTR anaysis that is very helpful to win over the Rankbrain algo. Indeed, in the Rankbrain patent, the pages that are below your average industry CTR are in danger. Our CTR vs Positions chart helps you spotting these pages : https://www.oncrawl.com/wp-content/uploads/2018/04/CTR-vs-positions.png"
|Integration with server logs||Yes|
|Integration with other tools||You can integrate OnCrawl with Majestic. Additionally, you can integrate OnCrawl with any data (if it's in CSV format and there is one common field: "URL").|
|Why do users should use OnCrawl||"Scalability: Oncrawl was born with the inputs of one of the biggest Ecommerce player in Europe (+20M pages) so Oncrawl has been designed to scale very big website. Then my co-founder Tanguy Moal (@Tuxnco) loves to empower people with technology so he built an Open Source Log Analyser to teach people how cool is it to combine crawl and logs data and then he built the cloud infrastructure to provide SEOs with an advanced Technical SEO platform at a very competitive price. He likes to democratise big tech.
Semantic analysis: we have unrivalled features like content blocks analysis or near duplicates
Advanced Data Exploration: we do NOT truncate your data. You can explore and export everything. We built the data explorer to help SEOs forget about crunching CSVs and run weird "VLOOKUP()" to manipulate their data. You can do it with just a few clicks with Oncrawl.
Our Data Platform that helps you plug any type of data from your GA or GSC account to more advanced metrics like your revenues, your SEA data, even your CRM data. With that under your belt, you will take much better decisions for your SEO and for your business.
|Free account - try||Yes, you can test a fully-featured OnCrawl for 14 days (1 project, 100k URLs)|
|OnCrawl has created a unique coupon for Elephate's readers, named “ElephateTR2018” This coupon will give user 15% discount on any subscription and is valid until December 31, 2018|