|Feature||Explanation||Does Sitebulb support a feature?|
|Basic SEO reports|
|List of indexable/non-indexable pages|
It's necessary to view a list of indexable / non indexable pages to make sure there are no mistakes. Maybe some URLs were intended to be indexable?
|Yes. Go to "Indexability" and click either on "Indexable" or "Not indexable".|
|Missing title tags|
Meta titles are an important part of SEO audits. A crawler should show you a list of pages that have missing tags.
|Yes. Go to "On Page" -> "Hints" -> "Title tag missing"|
|Filtering URLs by status code (3xx, 4xx, 5xx)|
When you perform an SEO audit, it's necessary to filter URL by status code. How many URLs are not found (404)? How many URLs are redirected (301)?
|List of Hx tags|
“Google looks at the Hx headers to understand the structure of the text on a page better.” - John Mueller
|Yes. Go to "internal" -> "All" and click on "Add/Remove columns" -> add "H1"|
|View internal nofollow links|
It's nice to see internal nofollow list to make sure there aren’t any mistakes.
|Yes. Go to "Audit Overview" -> "Internal". Then add a new column: "No. internal nofollow links"|
|External links list (outbound external)|
A crawler should allow you to analyze both internal and external outbound links.
|Yes. Go to "External" -> "All"|
|Link rel="next" (to indicate a pagination series)|
When you perform an SEO audit, you should analyze if the pagination series are implemented properly.
|Yes. Sitebulb: "we use certain factors (HTML elements, rel="next"/"prev", URL parameters) to detect paginated pages. Go to "Internal" and there is a pie chart for Paginated/Not Paginated - click on either area to see a list of URLs for each. Further, Sitebulb will check if paginated URLs are using rel="next"/"prev": go to "Indexability" -> "Canonical Hints" -> "Paginated URL missing next/prev canonicals"|
Hreflang tags are the foundations of international SEO, so a crawler should recognize them to let you point to hreflang-related issues.
|Yes. Go to Audit overview -> Internal and select columns related to Hreflang tags|
|Canonical tags||Every SEO crawler should inform you about canonical tags to let you spot indexing issues.||Yes. Go to "Internal" -> "View All" and add some columns related to canonicals. Also you can see all issues related to canonicals by going to "Indexability" -> "Canonical Hints".|
|Information about crawl depth - number of clicks from a homepage|
Additional information about crawl depth can give you an overview of the structure of your website. If an important page isn’t accessible within a few clicks from a homepage, it may indicate poor website structure.
|Yes. Go to "Audit Overview" -> "Crawled URLs" and review the "Crawl depth" column|
|List of empty / Thin pages|
A large number of thin pages can negatively affect your SEO efforts. A crawler should report them.
|Yes (based on the word count). Go to "On Page" and see Word Counts graph.|
|Duplicate content recognision|
A crawler should give you at least basic information on duplicates across your website.
|Yes. Go to the "Duplicate Content" section. Sitebulb: we detect 5 different types of duplicate content: Exact duplicate content (same HTML), duplicate page titles, duplicate meta descriptions, duplicate H1s, and duplicate URLs (e.g. same parameters in different order). You can click through to the URL Details for each URL and see all the duplicates, or do the 'Export Duplicate Content' export to get all the duplicate content straight into Excel.|
|A detailed report for given URL|
It's must-have! If you do a crawl of a website, you may want to see internal links pointing to a particular URL, to see headers, canonical tags, etc.
Advanced URL filtering for reporting - using regular expressions and modifiers like "contains," "start with,” "end with."
I can't imagine my SEO life without a feature like this. It’s common that I need to see only URLs that end with “.html” or those which contain a product ID. A crawler must allow for such filtering.
|Yes + you can combine rules by OR/AND|
|Adding additional columns to a report
This is also a very important feature of crawlers. I simply can't live without it. When I view a single report, I want to add additional columns to get the most out of the data. Fortunately, most crawlers allow this.
Some crawlers offer the possibility to categorize crawled pages (i.e. blog, product pages etc) and see some reports dedicated to specific categories of pages.
|Filtering URLs by type (HTML, CSS, JS, PDF etc)
Crawlers visit resources of various types (HTML, PDF, JPG). But, usually you want to review only HTML files. A crawler should support this.
|Basic statistics about website structure - ie. Depth stats,||Yes, go to "Audit overview" -> "Crawled URL by depth" http://take.ms/noZXy|
|Overview - the list of all the issues listed on a single dashboard
It's a positive if a crawler lists all the detected issues on a single dashboard. Of course, it will not do the job for you, but it can make SEO audits easier and more efficient.
|Yes, click on "All hints"|
|Comparing to the previous crawl|
When you work on a website for a long time, it’s important to compare the crawls that were done before and after the changes.
|Yes. Go into a Project and select 2 audits for 'Crawl Comparison'. Additonally all major datapoints and Hints have a sparkline trend graph alongside.|
|List mode - crawl just the listed URLs (helpful for a website migration)|
Sometimes you want to perform a quick audit of a specified set of URLs without crawling the whole website.
|Changing the user agent|
Sometimes, it's necessary to change the user agent. For example, even when a website blocks Ahrefs, you still need to perform a crawl. Also, more and more websites detect Googlebot by user agent and serve it a pre-rendered version instead of fully equipped JS.
|Crawl speed adjusting |
You should be able to set a crawl speed i.e 1-3 URLs per second if a website can't handle host load, while you may want to crawl much faster if a website is healthy.
|Yes + you can change the crawling speeed during crawling|
|Can I limit crawling? Crawl depth, max number of URLs|
Many websites have millions of URLs. Sometimes, it's good to limit the crawl depth or specifying a max number of URLs allowed to crawl.
|Analyzing a domain protected by an htaccess Login|
(helpful for analyzing staging websites)
This is a helpful feature if you want to crawl the staging website.
|Can I exclude particular subdomains, include only specific directories?||Yes, you can exclude URLs, by using the Robots.txt syntax|
|Universal crawl -> crawl + list mode + sitemap||Yes|
It's handy to be able to schedule a crawl and set monthly/weekly crawls.
|Indicating the crawling progress|
If you deal with big websites, you should be able to see the current status of a crawl. Will you wait a few hours, or weeks till the 1kk+ crawl will finish?
Accidental changes in robots.txt can cause Google to not be able to read and index your content. It's beneficial if a crawler detects changes in Robots.txt and informs you.
|Crawl data retention|
It’s good if a crawler can store results for a long period of time.
|Forever (as long as you have active licence)|
|Notifications - crawl finished|
A crawler should inform you when a crawl is done (desktop notification / email).
|Yes (desktop notifications)|
|Advanced SEO reports|
|List of pages with less than x links incoming|
If there are no internal links pointing to a page, it may mean for Google that the page is probably irrelevant. It’s crucial to spot orphan URLs.
|Yes. Click on "Links" and view bar chart for "Incoming Internal Followed Links". Click to view URLs.|
|Comparison of URLs found in sitemaps and in crawl.||Sitemaps should contain all the valuable URLs. If some pages are not included in a sitemap, it can cause issues with crawling and indexing by Google. |
If a URL is apparent in a sitemap, but can't be accessible through crawl, it may be a signal to Google that a page is not relevant.
|Yes. Go to "XML Sitemaps" -> click either on "Not in Sitemaps" or "Only in sitemaps"|
|Internal Page Rank value||Although any PageRank calculations can’t reflect Google’s link graph, it’s still a really important feature. Imagine you want to see the most important URLs based on links. Then you should sort URLs by not only simple metrics like number of inlinks, but also by internal PageRank. You think Google doesn’t use PageRank anymore? http://www.seobythesea.com/2018/04/pagerank-updated/||Yes. It's called "Link Equity Score."|
In mobile-first indexing it’s necessary to perform a content parity audit between the mobile and desktop versions of your website
|Yes, you have a few possibilities here:|
- performing AMP checking
- performing two crawls: first with mobile UA, second with desktop UA, and then, comparing the crawl results
Also, while setting up a crawl, you can tick the following option: "Page Speed, Mobile Friendly and Front-end".
|Additional SEO reports|
|Malformed URLs (https://https://, https://example.com/tag/someting/tag/tag/tag or https://www.example.com/first_part of URL||Yes. Go to "Internal URLs" -> "Hints", almost all of the Hints relate to malformed URLs.|
|List of URLs with parameters||Yes. Go to "Audit overview" -> "Internal" and add a new filter for the URL column: URL contains "?"|
|Mixed content (some pages / resources are served via HTTPS, some by HTTPS)||Yes. Go to "All hints" -> "Mixed content (loads HTTP resources on HTTPS URL)"|
|Redirect chains report|
Nobody likes redirect chains. Not users, not search engines. A crawler should report any redirect chains to let you decide if it's worth fixing.
|Yes. Go to "Redirects" -> "Internal redirects" and add a new column: "Final URL"|
|Website speed statistics|
Performance is becoming more and more important both for users and SEO. So crawlers should present reports related to performance.
|Yes. Go to the "Site Speed" section|
|List of URLs blocked by robots.txt|
It happens that a webmaster mistakenly prevents Google from crawling a particular set of pages. As an SEO, you should review the list of URLs blocked by robots.txt - to make sure there are no mistakes.
|Yes. Go to "Internal" and click on "Add/Remove columns" -> "Robots.txt Disallowed directive".|
|Exporting to excel / CSV|
Sometimes a crawler has no power here and you need to export the data and edit it in Excel / other tools.
|Yes, you can export the data to XLSX file. There are some pre-built exports which you can access via 'Bulk Exports.' One thing you can't do is export a raw list of all links.|
|Exporting to PDF||Yes|
|Custom reports / dashboards||No|
|Sharing individual reports|
Imagine that you want to share a report related to 404s with your developers. Does the crawler support it?
|No (but you can generate PDF that contain the most important insights)|
|Granting access to a crawl for another person|
It's pretty common that two or more people work on the same SEO audit. Thanks to report sharing, you can work simultaneously.
|No (but you can send a Sitebulb crawl file to colleagues so they can open it on their Sitebulb copy|
|Explanation on the issues|
If you are new to SEO, you will appreciate the explanation of the issues that many crawlers provide.
|Yes, you can see an explanation for each Hint. For some hints, you can click on "Learn more" to view details on the Sitebulb website|
A crawler should let you perform a custom extraction to enrich your crawl. For instance, while auditing an e-commerce website, you should be able to scrape information about product availability and price.
|Can crawler detect the unique part - that is not a part of the template?||It’s valuable if a crawler let you analyse only the unique part of a page (excluding navigation links, sidebars and footer).||No|
|Ability to use the crawler's API||No|
|Supported operating systems||Windows, Mac|
|Integration with Google Analytics||Yes. "Sitebulb integrates with Google Analytics and pulls traffic, engagement and conversion data via the API. It will automatically identify pages with no traffic and low dwell time, and alert you to potential issues, such as pages with high bounce rate or low time on page"|
|Integration with Google Search Console||Yes. "Sitebulb has as twofold integration with Google Search Console. At the basic level, it can pull out page level data for clicks and impressions, but if you also select 'Keyword Analysis' during the audit setup, it will collect all ranking keywords tracked in Search Analytics, and present rankings, clicks, impressions and CTR for every keyword"|
|Integration with server logs||No|
|Integration with other tools||No|
|Yes, Sitebulb uses Chrome Headless|
|Why do users should use your crawler?||"One of the things that SEOs need to do is influence people. You need to convince your client that they have a problem with canonicals, or you need to convince your boss that it's worth spending time writing unique copy, or you need to convince YOURSELF that the website architecture really is screwed up.|
Sitebulb helps you understand problems and opportunities with a website, and gives you the tools to influence people, to help them understand technical issues and get buy-in for further work.
It does this with clarity, through hundreds of detailed and comprehensive 'Hints', and a wide range of charts and data tables that allow you to support your arguments visually.
The spectacular Crawl Maps illustrate architecture issues in a way that your clients have never seen before, simplifying a complex topic and making it accessible to all."
|Free account - try||Yes, you can use a fully-featured 14 day trial|
|If you go to Sitebulb.com/elephate you will get 60 day free trial|