Open Source Intelligence Gathering and Social Engineering


© All rights reserved.

This lab for Leeds Metropolitan University is based on material that was developed in collaboration with sec-1, a local security specialist, see for more details.

Authors include: Z. Cliffe Schreuders, Emlyn Butterfield, Julian Old, and sec-1.



Open source intelligence gathering

Google “Hacking” and passive information gathering

Google search terms

Google search examples

Google Hacking Database (GHDB)


Starting Maltego


Maltego machines

Navigation and views

Performing a Person search

Domain based search

Social engineering using Social Engineering Toolkit

Stealing passwords using cloned sites



As with all of the labs in this module, start by loading the latest version of the LinuxZ template from the IMS system. If you have access to this lab sheet, you can read ahead while you wait for the image to load.

To load the image: press F12 during startup (on the blue boot screen) to access the IMS system, then login to IMS using your university password. Load the template image: LinuxZ.

Once your LinuxZ image has loaded, log in using the username and password allocated to you by your tutor.

The root password -- which should NOT be used to log in graphically -- is “tiaspbiqe2r” (this is a secure password but is quite easy 2 remember). Again, never log in to the desktop environment using the root account -- that is bad practice, and should always be avoided.

Using the VM download script (as described in the previous lab), download these VMs:

Feel free to read ahead while the VMs are downloading.

The first part can be completed from any OS/VM with a Web browser, the Maltego and SET tasks can be completed from within Kali Linux.

Open source intelligence gathering

Open source intelligence refers to information that is publicly available[1]. Lots of information about a target of attack can often be found online, in openly available places. Mining this information can result in information that can help in an attack, including technical information about their infrastructure, or personal information such as employee names and interests, which can be used in social engineering attacks.

Some examples of sources of information:

Google “Hacking” and passive information gathering

Google hacking is a term that refers to the art of creating complex search engine queries in order to use Google to find information related to computer security: for example, to find a file containing usernames and passwords accidently left open to be indexed by Google.

Google does a very good job of retrieving information for us. The key is our ability to use meaningful search criteria and to interpret the returned information. Google can be an important source of information during reconnaissance for any attack – it can also be used to identify the weaknesses of an organisation, which can be of use to malicious attackers and security assessors. Google and other search engines index and cache anything they find – they do not care about what it is.

This information can be used for identifying known vulnerabilities and it can also be used for social engineering activities. In particular you will find authors attached to documents, this can give you an idea about the usernames used within an organisation.

The information gathered from Google can be considered as a form of passive information gathering.

What is passive information gathering? What is the advantage of this approach to an attacker?

Although we may find information this way we must treat anything we find in accordance with the Data Protection Act, as it may not be intentionally available.

Google search terms

The main terms of interest from Google, to you, are:


The “related:” will list web pages that are "similar" to a specified web page. For Example: “” will list web pages that are similar to the Securityfocus homepage. Note there can be no space between the "related:" and the web page url.


The query “cache:” will show the version of the web page that Google has in its cache. For Example: “” will show Google's cache of the Google homepage. Note there can be no space between the "cache:" and the web page url.

If you include other words in the query, Google will highlight those words within the cached document. For Example: “ guest” will show the cached content with the word "guest" highlighted.


The “intext:” syntax searches for words in a particular website. It ignores links or URLs and page titles. For example: “intext:exploits” (without quotes) will return only links to those web pages that has the search keyword "exploits" in its webpage.


The “intitle:” syntax helps Google restrict the search results to pages containing that word in the title. For example, “intitle:login password(without quotes) will return links to those pages that has the word "login" in their title, and the word "password" anywhere in the page.

Similarly, if one has to query for more than one word in the page title then in that case “allintitle:” can be used instead of “intitle” to get the list of pages containing all those words in its title. For example using “intitle:login intitle:password” is same as querying “allintitle:login password”.


The “inurl:” syntax restricts the search results to those URLs containing the search keyword. For example: “inurl:passwd” (without quotes) will return only links to those pages that have "passwd" in the URL. Similarly, if one has to query for more than one word in an URL then in that case “allinurl:” can be used instead of “inurl” to get the list of URLs containing all those search keywords in it. For example: “allinurl:etc/passwd“ will look for the URLs containing “etc” and “passwd”. The slash (“/”) between the words will be ignored by Google.


The “site:” syntax restricts Google to query for certain keywords in a particular site or domain. For example: “exploits” (without quotes) will look for the keyword “exploits” in those pages present in all the links of the domain “”. There should not be any space between “site:” and the “domain name”.


This “filetype:” syntax restricts Google search for files on internet with particular extensions (i.e. doc, pdf or ppt etc). For example: “filetype:doc site:gov confidential” (without quotes) will look for files with “.doc” extension in all government domains with “.gov” extension and containing the word “confidential” either in the pages or in the “.doc” file. i.e. the result will contain the links to all confidential word document files on the government sites.


The “link:” syntax will list down web pages that have links to the specified web page. For Example: “” will list web pages that have links pointing to the SecurityFocus homepage. Note there can be no space between the "link:" and the web page url.

Google search examples

You can conduct these searches from the OS/VM of your choice.

For each of the following, record the answer and search query you used in your log book.

Identify the number of MS Word documents (.doc and .docx) present within the domain.  

Find the head of school’s name and email address for cultural studies of Leeds Met.

Try to find information about yourself and friends. You will be amazed at what you find.

Do some searches to answer the following:

This information can be used to create an entire profile of an individual, including a biographical dictionary that could be used to identify passwords, or for social engineering, or identity theft.

You are given the term “Nmap” as your only reference point for this exercise. Conduct a search for “Nmap”, the aim of this search is to identify websites directly related to the tool, not just referencing it. You should attempt to do this using passive information gathering – do not click links to websites. By clicking a link there is now evidence that you have accessed that site – your IP is logged. Aim to leave no trace whatsoever.

There are two key ones, record them in your log book.

Website A


Website B



To assist in passive information gathering we can use web archive services; these provide archived historic versions of websites. This provides information which may not be present on the website anymore or has changed due to a security conscious professional.

Visit the Internet Archive Wayback Machine ( and search for each website identified above in turn.

You will see that snapshots have occurred for a number of years. It is an idea to review each of these in turn to identify sensitive information. You may get lucky and find sensitive server or network information.

Warning: Although these may be archives they may contain active links that will connect back to the target web server. We can restrict this from happening by modifying Firefox settings.

Select Edit | Preferences| Content and select Exception, located next to “Load images automatically”. Enter the url and select Block.

You can restrict access further by adding the targets address to the “Restricted Sites” zone in the Internet Properties menu.

Find the mailing list archive for nmap using – what is the website used for the archive of this?


This gives us an additional website of interest – add this to your list.

Identify the username and email address of the individual who posts regarding new releases of nmap.

User name


Email address



Look at the source of the web page for the subscription form for the mailing list. What method is used for this action?

You should find that this is the HTTP “POST”. This is not very exciting but if you look at such things as default you may find something with a known vulnerability.

Google Hacking Database (GHDB)

An excellent resource for “Google Hacking” is the Google Hacking Database (GHDB) - this was previously referred to as – you’ll still find numerous references to this around the Internet.

Browse this website and run some of the examples; the ability to identify publicly available information is invaluable to you as a professional (and as a security tester).

Search the GHDB for a way of enumerating “exported email addresses”.

Hint: look under the “Files containing juicy info” section, look for the above quote, then click the information () icon.

Search for phpMyAdmin dumps.

Search for accidentally released results from vulnerability scans.

Can you use GHDB to find vulnerabilities in websites? How?

Clearly Google hacking can yield highly sensitive results.

Note that the fact that Google has indexed the information does not make it “released to the public domain”, so if you don’t have permission to test the security of an organisation, don’t.

What other kinds of sensitive data can Google hacking reveal?


Maltego is an Information Relationship Framework tool developed by the Paterva Group, which uses various plugins known as transforms to gather information about a target using various open source intelligence gathering methods and identifies relationships between this information.

Maltego operates on a client-server basis. Search parameters known as entities are issued at the client, with information gathering and processing done at the server. The results are then passed back to the client to display. This model allows expansion with the development of custom transforms held on a remote server and pointed to by the client software.

Maltego transforms will expand a standard query into numerous others - each with a "confidence index" For example, a search on the name ‘John Smith’ will also search on ‘J Smith’ and ‘Smith J’ but with lower confidence levels over that of ‘John Smith’. This process is explained in more detail under the transforms description.

Maltego would normally be used after basic DNS and IP address information has been gathered and will be used to query that information to make relationships between it and the following data:

One of Maltego’s strengths is in its use of a clever GUI to display the returned data produced by the application. Maltego also supporting a number of export methods to output the results in various formats.

Maltego Community Edition v2.0

Starting Maltego

Maltego is a Java application which allows the product to be installed and run on any system supporting the Java VM natively, which means MS Windows, Linux and Mac OSX can support Maltego. Maltego has a free limited Community Edition, and a Commercial paid-for version.

Kali Linux includes Maltego.

On the Kali Linux VM, start Maltego via the Applications menu: Kali Linux, Information Gathering, OSINT Analysis, maltego.

Starting Maltego in Kali Linux

If you are in the IMS labs, you may need to set Maltego to use a proxy: Note this has already been set in the VM noted above.

You may need to register an account or retrieve a new API key, if the current one has expired.


Before a scan can commence for the first time in Maltego, a transform server must be selected and the transforms discovered and activated. Even if in the version of Maltego you have the transforms are already discovered, it is considered good practice to discover new transforms to maximize the successfulness of any searches.

In Maltego the term transform is the name given to the various plugins that Maltego uses to configure how to the search for the information. The transform takes the highest scoring search result i.e. The result that matched closest to the search parameter, records this with a high confidence value, and then performs additional searches based on amalgamations of the original search parameter.

For example:

A transform that searches on the email address: would classify any results returned containing that exact email address to be the highest and most confident returned data. The transform would then try searches for:

and so on, to maximized the results and would classify these results with a lower confidence rating.

Maltego shows this hierarchy when it produces the results, the top of the tree being the most accurate data returned.

Note that the initial installation and registration of Maltego will update the available transforms.

Transforms can be conducted by a Transform Distribution Server, meaning your request is sent to the server which does the transform and sends it back, or it can be conducted locally.

Maltego Transform Distribution Servers. Image:

Maltego machines

Maltego includes the concept of a “machine”, which automates the selection and use of transforms to conduct targeted searches. When Maltego starts you may see the Start a Machine dialog. If not, start it

Maltego Start a Machine

The “Company Stalker” machine takes a domain name, searches for documents and email addresses of employees, then searches for those people on social networks.

Select the “Company Stalker” machine and click Next.

Enter the domain “”

Watch Maltego aggregate information from the Internet.

Next, click “Proceed with selected.”

Maltego email filter prompt

Accept the terms of service, and click “Run”.

Accepting terms of service

As the information is returned, it will be displayed in Maltego’s graph view.

Note, that this will determine email addresses for Leeds Met staff.

How could this information be helpful to an attacker?

Maltego Interface

Zoom out, by using the mouse scroll.

Maltego zooming out

Choose an item on the graph, and apply a transform (right click, Run Transform) to find more information.

For example, right click an email address, and choose an individual transform from the list.

If you like, you could run “All transforms” to cast the net further.

Running transforms

Can you find anyone’s Facebook profiles?

In its most basic form Maltego can simply be run and all transforms activated at once, which will return a wealth of information but like any Scanner this also produces a large number of false positives. For example, searching on the Name ‘John Smith’ will produce many results as it is likely that there will be evidence of many ‘John Smith’s on the internet. Therefore Maltego is required to be manually filtered after each search, or more starting data to be supplied to obtain the most accurate results. New searches would be run on specific areas of data returned in a previous scan.

Try selecting multiple entities (drag left click, or Ctrl left for individuals), and running transforms on them all.

What is the advantage of using a single transform or a small group, rather than all transforms?

Navigation and views

Experiment with navigation: zoom in and out with the scroll wheel.

Pan using the Graph Navigator or, right click and drag.

Switch between “Main View”, “Bubble View”, and “Entity List”.

Bubble view

Performing a Person search

Entities are located in the 'palette' section - drag them onto the graph and double click on the text to edit their properties – this will form the basis of the search parameters that will be used.

Under “Personal” drag “Person” into the graph.

Palette pane

In “Main View”, double click the name text (John Doe), and enter your own name.

Person node

Right click on the entity to see the transforms which are compatible with that entity. If multiple TAS-es (Transform Servers) exist with the same transform you will see all. Select one to use.

Run “All transforms” on yourself.

When prompted, accept each terms and conditions, and enter “ “ (space) for the text inputs.

How could this information be used as part of social engineering?

Is there any information here you are surprised to find public?

Right click on a resulting node, and run some further transforms.

Do the same for someone else you know (create the person and run transforms).

Starting with a person(s) name or perhaps their email address(s) from a previous Maltego scan as the entity to query, it is possible to use Maltego to find information specifically about those individuals. This would prove useful in social engineering attacks, or as a means of identifying what that individual may have posted to the Internet about the target organisation.

Domain based search

Often a search preceding an attack or for a security audit would start at a domain name, such as “”.

Select the “” entity (from earlier).

Starting with a domain name or website as the initial entity, instruct Maltego to query using the following transform set, this is an excellent way of finding what data an organisation is allowing over the internet.

This can help identify malicious files that have been uploaded by an attacker and or accidental sensitive files hosted by the organisation.

Run transforms to find “interesting documents”.

Can you find IP the addresses of servers?

Try running one of the footprinting machines.


Social engineering using Social Engineering Toolkit

So far you have learned to gather information about the people employed by a company, including harvesting email addresses. What could an attacker do with this information? Given someones email address and some personal information an attacker can attempt to social engineer the victim in various ways.

For example, they can try to trick the user into following a malicious link, to visit a page they control.

Start Social Engineering Toolkit (Applications menu, Kali Linux, Exploitation Tools, Social Engineering Toolkit, setoolkit).

Starting SET on Kali Linux

Read, and agree to the terms of use.

You will be greeted by a console menu driven program.

SET main menu

Stealing passwords using cloned sites

Select “1) Social-Engineering Attacks”.

SET social engineering attacks

Select “2) Website Attack Vectors

SET website attack vectors

Select “3) Credential Harvester Attack Method”.

SET cred harvester methods

Select “1) Web Templates”.

Note that you could instead build a fake website based on any URL you like, but we will use the pre-ready Gmail template.

IP address prompt

Enter your own IP address for the Kali Linux VM.

You can find your Kali IP address by running “ifconfig” in another terminal window. (Use the address starting with 192).

Selecting a page to clone

Select “2) Gmail”.

The password harvester is now running.

Ask a classmate to visit your IP address from a Web browser, such as Firefox/IceWeasel.

From your system start a Web browser, and type a classmate’s IP address (or your own, if you are working by yourself) into the address bar. You will be greeted with something that looks a lot like the Gmail login page.

Tip: if you get a proxy error message, edit your browser’s proxy settings, to not use the proxy for this IP address (in Firefox: Edit, Preferences, Advanced, Network, Settings, Proxy).

SET password stealing fake Website

Enter a username and password, BUT NOT YOUR ACTUAL USERNAME AND PASSWORD!

Once someone has visited your cloned site, you will get any passwords anyone enters. Obtain a password entered into your fake site.

Fooling someone out of a password!

Experiment with SET’s website cloning feature. Can you make a credential stealing clone of an Internet banking login page?

Note: the proxy in the IMS labs may make this difficult.

SET also contains features to send out targeted “spearphishing” email-based attacks, attacks that target vulnerable web browsers, and many other kinds of social engineering attacks. You may wish to experiment with the software further.


As you have seen, a lot of personal information that can be useful to attackers makes its way onto the Internet. Sometimes it is due to accidental leaks of documents onto public websites, but attackers can also make use of seemingly innocuous information to social engineer people. For example, knowing that an employee has a specific hobby, can help an attacker craft a more convincing email.

You have:

Well done!

[1] Despite the similar name, this is not related to open source software, where the source code is released under an open license.