The Bug Hunter’s Methodology v4.01
Recon
Application
Analysis
TBHM v4
Recon
About Me
Project Tracking
In many parts of the workshop we will need to keep track of site-hierarchy, tools output, interesting notes, etc.
I use mindmaps with Xmind but the same effect can be achieved through a lot of different programs.
Mindmaps allow me to visualize large scope bug hunting targets and also allow me to break up methodology for in-depth bug hunting as well.
Mission
Wide Recon is the art of discovering as many assets related to a target as possible. Make sure your scope permits testing these sites.
Scope Domains
Acquisitions
ASN Enumeration
Subdomain Enumeration
Other
Vulns:
Subdomain Takeover
Buckets
Github leaks
Automation/Helper:
Interlace
Screenshotting
Frameworks
Reverse WHOIS
Port Analysis
Finding Seeds/Roots
Scope Domains (Bugcrowd)
Scope Domains (HackerOne)
Acquisitions (Crunchbase)
We want to continue to gather seed/root domains. Acquisitions are often a new way to expand our available assets if they are in scope. We can investigate a company’s acquisition on sites like https://crunchbase.com, wikipedia, and Google.
Acquisitions (Crunchbase)
Here we can possibly drill down into old domains related to Revlo, ClipMine, Curse, and GoodGame.
Remember to do some Googling on these acquisitions to see if they are still owned by the parent company. Many times acquisitions will split back out or get sold to another company.
ASN Enumeration (bgp.he.net)
Autonomous System Numbers are given to large enough networks. These ASN’s will help us track down some semblance of an entity’s IT infrastructure. The most reliable way to get these is manually through Hurricane Electric’s free-form search:
Because of the advent of cloud infrastructure, ASNs aren't always a complete picture of a network. Rogue assets could exist on cloud environments like AWS and Azure. Here we can see several IP ranges.
ASN Enumeration (cmd line)
Some automation is available to get ASNs. One such tool is the ‘net’ switch of “Metabigor” by j3ssiejjj which will fetch ASN data from a keyword from bgp.he.net and asnlookup.com
Another is “ASNLookup” by Yassine Aboukir which utilizes the maxmind.com dataset.
One problem with cmd line enumeration is that you could return records from another org on accident that contains the keyword ‘tesla’.
ASN Enumeration (with Amass)
For discovering more seed domains we want to scan the whole ASN with a port scanner and return any root domains we see in SSL certificates, etc.
We can do this with Amass intel
Amass is written by Jeff Foley and the Amass team.
Reverse WHOIS (with Whoxy.com)
Every website has some registration info on file with the registrars. Two key pieces of data we can use are Organization name and any emails in the WHOIS data. To do this you need access to a large WHOIS database. WHOXY.com is one such database.
You can use whoxy.com in this fashion, after you register and your free API key:
Careful with reverse whois data as it is the least high fidelity source of new root/seed domains. It might include many parked domains or redirects to out of scope assets.
Reverse WHOIS (with DOMLink)
DOMLink is a tool written by Vincent Yiu (@vysecurity) which will recursively query the WHOXY WHOIS API. It will start by querying our targets WHOIS record, then analyze all the data and look for other records which contain the organization name or are registered to emails in the record. It does this recursively until it finds no more records of match.
Ad/Analytics Relationships (builtwith.com)
You can also glean related domains and subdomains by looking at a target’s ad/analytics tracker codes. Many sites use the same codes across all their domains. Google analytics and New Relic codes are the most common. We can look at these “relationships” via a site called BuiltWith. Builtwith also has a Chrome and Firefox extension to do this on the fly.
BuiltWith is also a tool we’ll use to profile the technology stack of a target in later slides.
Ad/Analytics Relationships (getrelationship.py)
Want to do it on the command line?
@M4ll0k has you covered!
https://raw.githubusercontent.com/m4ll0k/Bug-Bounty-Toolz/master/getrelationship.py
Google-Fu
You can Google the:
from a main target to glean related hosts on Google.
Shodan
Shodan is a tool that continuously spiders infrastructure on the internet. It is much more verbose than regular spiders. It captures response data, cert data, stack profiling data, and more. It requires registration.�
Example:
https://www.shodan.io/search?query=twitch.tv
We can glean a valuable question: is twitch.amazon.eu relevant to our testing?
Finding Subdomains
Subdomain Enumeration
Subdomain Enumeration
Subdomain Scraping
Subdomain Bruteforce
Linked and JS Discovery
++
Linked and JS Discovery
Linked Discovery (with Burp Suite Pro)
Another way to to widen our scope is to examine all the links of our main target. We can do this using Burp Suite Pro.
We can visit a seed/root and recursively spider all the links for a term with regex, examining those links... and their links, and so on... until we have found all sites that could be in our scope.
This is a hybrid technique that will find both roots/seeds and subdomains.
Linked Discovery (with Burp Suite Pro)
Burp after requesting one site:
Linked Discovery (with Burp Suite Pro)
Linked Discovery (with Burp Suite Pro)
After the 1st spider run we’ve now discovered a ton of linked URLs that belong to our project.
Not only subdomains, but NEW seeds/roots (twtchapp.net, ext-twitch.tv, twitchsvc.net).
We can also now spider these new hosts and repeat until we have Burp Spider fatigue.
Linked Discovery (with Burp Suite Pro)
Now that we have this data, how do we export it?
Clumsily =(
Linked Discovery (with GoSpider or hakrawler)
Linked discovery really just counts on using a spider recursively.
One of the most extensible spiders for general automation is GoSpider written by j3ssiejjj which can be used for many things and supports parsing js very well.
In addition hakrawler by hakluke has many parsing strategies that interest bug hunters.
Subdomain Enumeration (with SubDomainizer)
Subdomainizer by Neeraj Edwards is a tool with three purposes in analyzing javascript. It will take a page and run
It will take a single page and scan for js files to analyze.
If just looking for subdomains subscraper by Cillian-Collins might be better because it has recursion.
Subdomain Scraping
Subdomain Scraping Sources
The next set of tools scrape domain information from all sorts of projects that expose databases of URLs or domains.
New sources are coming out all the time so the tools must evolve constantly.
This is only a small list of sources. Many more exist.
Security Sources
Certificate Sources
Infrastructure Sources
Search Sources
Subdomain Scraping Example (Google)
Subdomain Scraping (Amass)
For scraping subdomain data there are two industry leading tools at the moment; Amass and Subfinder. They parse all the “sources” referenced in the previous slide, and more.
Amass has the most sources, extensible output, bruteforcing, permutation scanning, and a ton of other modes to do additional analysis of attack surfaces.
Subdomain Scraping (Amass)
Amass also correlates these scraped domains to ASNs and lists what network ranges they appeared in.
Useful.
If a new ASN is discovered you can feed it back to amass intel.
Subdomain Scraping (Subfinder v2)
Subfinder is another best in breed tool originally written by ice3man and Michael Skelton.
It is now maintained by a larger group, called projectdiscovery.io
It incorporates multiple sources, has extensible output, and more.
Subdomain Scraping (github-subdomains.py)
A new addition to my subdomain enumeration is scraping Github. Github-subdomains.py is a script written by Gwendal Le Coguic as part of his EPIC Github enumeration repo called “github- search” It will query the Github API for references to a root and pull out subdomains.
Note: the Github API returns somewhat random results and is rate limited. In my automation I run 5 iterations of this script. Four of them with a six second sleep in between them, and the last with a 10 second sleep, to get some consistency.
Subdomain Scraping (shosubgo)
Subdomain Scraping (Cloud Ranges)
A highly valuable technique is to monitor whole cloud ranges of AWS, GCP, and Azure for SSL sites, and parse their certificates to match your target.
Doing this is cumbersome on your own but possible with something like masscan. Daehee Park outlines it here.�
Luckily Sam Erb did a wonderful defcon talk on this and created a service which scans every week.
Some bash scripting required ;)
Subdomain Bruteforce
Subdomain Bruting
At this point we move into guessing for live subdomains.
If we try and resolve
thistotallydoesntexist.company.com we will *usually* not get a record.
So we can use a large list of common subdomain names and just try and resolve them analyzing if they succeed.
The problem in this method is that only using one DNS server to do this will take forever. Some tools have come out that are both threaded and use multiple DNS resolvers simultaneously. This speeds up this process significantly. Massdns by @blechschmidt pioneered this idea.
Amass (8 revolvers by default) does this with the -rf flag.
Subdomain Bruting (Amass)
Amass offers bruteforcing via the “enum” tool using the “brute” switch.
It has a built in list but you can specify your own lists.
You can also specify any number of resolvers
Doing this with amass also gives us the opportunity to resolve the found domains.
I haven’t checked out aisdnsbrute yet but i’ve heard it’s fast.
Subdomain Bruting (shuffleDNS)
If you like separating the work to different tools or prefer the massdns core you can use shuffleDNS by the ProjectDiscovery team.
Subdomain Bruting Lists
A multi resolver, threaded subdomain bruter is only as good as it’s wordlist.
There are two trains of thought here:
Both have advantages.
My all.txt file is still what i use on a regular basis. It combines 7 years of DNS bruteforce lists into one.
But you can make customized wordlists with something like what TomNomNom talked on yesterday.
I also think there was a tool for this recently...
Subdomain Bruting Lists
New lists for subdomain bruteforce are relatively the same nowadays, but the 1st team to really iterate on this is the AssetNote team with their commonspeak data collection.
The all.txt file includes commonspeak v1 data but there is also a second version of commonspeak data out:
Alteration Scanning
When bruteforcing or gathering subdomains via scraping you may come across a naming pattern in these subdomains.
Even though you may not have found it yet, there may be other targets that conform to naming conventions.
In addition, sometimes targets are not explicitly protected across naming conventions.
The first tool to attempt to recognize these patterns and bruteforce for some of them was altdns written by Naffy and Shubs.
Now amass contains logic to check for these “permutations”. Amass includes this analysis in a default run Some personal experience cited on the next page.
dev.company.com
dev1.company.com
dev2.company.com
dev-1.company.com
Dev-2.company.com
Dev.1.company.com
Dev.2.company.com
Alteration Scanning (WAF Bypass)
Other
Favicon Analysis (favfreak)
A more fringe technique of discovering a brands assets is taking their favicon and hashing it. You can then take this hash and search shodan for it. You can also scan IP ranges and cloud blocks to find assets with the same hash.
In addition you can make hashes of commonly used admin portals or framework logins. This method is useful when an org has modified URL paths and maybe your scanners are only looking for a specific path. If the org has changed it all you won't find it, but org’s rarely change the favicon for the frameworks. Check out FavFreak by Devansh Batham (0xAsm0d3us)
shodan search org:"Target" http.favicon.hash:116323821 --fields ip_str,port --separator " " | awk '{print $1":"$2}'
Port Analysis (masscan)
Most hacker education would have you use nmap here, but masscan by Robert Graham is much faster for general “finding-open-ports-on-TCP”. Chaining masscan’s output to then be nmap’ed can save a lot of time.
Masscan achieves this speed with a re-written TCP/IP stack, true multi-threading, and is written in C.
Sample syntax for scanning a list of IPs:
A full syntax guide of masscan (authored by Daniel Miessler) can be found here: https://danielmiessler.com/study/masscan/
Port Analysis (dnmasscan)
One limitation of masscan is that it only scans IP addresses. Y
you can write you own simple converter script or you can use something like dnmasscan by @rastating
Service Scanning (brutespray)
When we get this service/port information we can feed it to nmap to get a OG outputfile.
We can then scan the remote administration protocols for default passwords with a tool called Burtespray by @x90skysn3k which takes the nmap OG file format.
masscan
Brutespray credential bruteforce
Nmap service scan -oG
Github Dorking (manual)
Many organizations quickly grow in their engineering teams. Sooner or later a new developer, intern, contractor, or other staff will leak source code online, usually through a public Github.com repo that they mistakenly thought they had set private.
Enjoy my quick github dork collection:
** Helps if your console supports clickable hyperlinks
The repo mentioned earlier by Gwendal Le Coguic called “github- search” has some automated github tools for this as well.
Also check out @th3g3ntelman’s full module on Github and Sensitive data Exposure.
Screenshotting (Eyewitness, Aquatone, httpscreenshot)
At this point we have a lot of attack surface. We can feed possible domains to a tool and attempt to screenshot the results. This will allow us to “eye-ball” things that might be interesting.
There are many tools for this. Aquatone is a wider recon framework that does this, HTTPscreenshot, and Eyewitness. I use Eyewitness because it will prepend both the http and https protocol for each domain we have observed. I’m not highly tied to this tool though, find one that works for you.
Also, check out WitnessMe
Subdomain takeover (can i take over xyz)
“Subdomain takeover vulnerabilities occur when a subdomain (subdomain.example.com) is pointing to a service (e.g. GitHub pages, Heroku, etc.) that has been removed or deleted. This allows an attacker to set up a page on the service that was being used and point their page to that subdomain. For example, if subdomain.example.com was pointing to a GitHub page and the user decided to delete their GitHub page, an attacker can now create a GitHub page, add a CNAME file containing subdomain.example.com, and claim subdomain.example.com.”
A great resource for subdomain takeover is Ed Overlow’s repo can-i-take-over-xyz
Subdomain takeover (SubOver & nuclei)
To find subdomain takeovers we can use a few tools.
SubOver is a discontinued stand alone tool by Ice3man and has since been incorporated to Project Discovery’s nuclei scanner.
Nuclei is part of a larger scanning framework but boasts the most takeover checks of any tool i’ve seen.
Automation++
Extending tools (interlace)
Eventually you will want to make script or recon framework of you own. Quickly you will come up against some problems:
You can rewrite a tool yourself to handle these issues but some help does exist here.
Interlace by Michael Skelton aka Codingo is an awesome tool than help glue together a recon framework.
Interlace can take these tools and add support for: CIDR input, Glob input, threading, proxying, queued commands, and more.
Hakluke wrote a great guide on it here.
Extending tools (anything TomNomNom writes)
Tomnomnom has an extensive repo of tools which are awesome. I highly suggest you check them all out.
Frameworks
It could be recon is not really your thing. That’s all right.
Several hunters have open sourced their automation at this point and you can choose one that fits you and use it without worrying too much. I usually classify recon frameworks in rough tiers:
C-Tier: automation built around scripting up other tools in bash or python. Step based, no workflow. Few techniques. Little extensibility.
B-Tier: automation writing a few of their own modules. Some GUI or advanced workflow. Medium number of techniques. Runs point-in-time. Flat files.
A-Tier: (maybe) automation writing all their own modules. Has GUI. Runs iterativley. Manages data via db.
S-Tier: automation writing their own modules. Has GUI. Runs iterativley. Manages data via db. Scales across multiple boxes. Sends alerts to user. Uses novel techniques and iterates quickly. ML + AI.
Frameworks
Warning: My scale is mostly subjective and can be missing factors. It is also based off of my rough experience and gut feel.
Frameworks (C-Tier)
ultimate_recon.sh
Frameworks
(B-Tier)
|
LazyRecon
Frameworks
LazyRecon
(A-Tier)
Frameworks (S-Tier)
Frameworks (S-Tier)
Project Discovery Framework (unreleased)
Frameworks (S-Tier)
(unreleased)
(scanner)
written by j3ssiejjj and team
Frameworks (S-Tier)
Bounty.offensiveai.com (paid) by @ghostlulz1337
Honorable Mention
Nuclei
The End