Google Hacking
Search engines existed well before Google started. However,
Google changed the way
search engines worked, and as a result overtook the existing popular search sites like
Altavista, InfoSeek,
and Inktomi, all of which have since been acquired or been put
out of business. Many other search engines have become defunct. Google was not
only able to create a search engine that was useful but also find a unique way to mon‐
etize that search engine, allowing the company to remain profitable and stay in busi‐
ness.
One feature that Google introduced is a set of keywords
that users can use to modify
their search requests, resulting in a tighter set of pages to look at. Searches that use
these keywords are sometimes called
Google Dorks
, and
the entire process of using
keywords to identify highly specific pages is called
Google Hacking
. This can be an
especially powerful set of knowledge to have when you are trying to gather informa‐
tion about your target.
One of the most important keywords when it comes to isolating information related
to
a specific target is the
site:
keyword. When you use this, you are telling Google that
you want only results that match a specific site or domain. If I were to use
site:oreilly.com
, I would be indicating that I want to only
look for pages that belonged
to any site that ended in
oreilly.com
. This could include sites like
blogs.oreilly.com
or
www.oreilly.com
. This allows you to essentially act as though every organization has a
Google search engine embedded
in their own site architecture, except that you can
use Google to search across multiple sites that belong to a domain.
Although you can act as though an organization has its own search
engine, it’s important to note that when using this sort of techni‐
que, you will find only pages and sites that have reachability from
the internet. You also won’t get sites that
have internet reachability
but are not referenced anywhere else on the internet: you won’t get
any intranet sites or pages. Typically, you would have to be inside
an organization to be able to reach those sites.
You may want to limit yourself to specific file types. You may be looking for a spread‐
sheet or a PDF document. You can use the
filetype:
keyword
to limit your results to
only those that are that file type. As an example, we could use two keywords together
to get detailed results. You can see in
Figure 3-1
that
the search is for
site:oreilly.com
filetype:pdf
. This will get us PDF documents that Google has identified on all sites that
end in
oreilly.com
, and you can see two websites listed in the first two results.