page considers search strategies and online search behaviour,
including questions about how users navigate and their
assessment of 'good enough' information retrieval.
It covers -
about search behavior metrics are explored in the separate
guide on Metrics & Statistics.
Some myths about what is online (and whether that content
is accessible) are explored
in a supplementary profile.
Online search behaviour is of interest for a range of
They include what might be characterised as the politics
of information, encompassing questions about -
- being found on the net has a commercial value (whether
direct or indirect) and much electronic commerce is
built on visibility, competing for audiences in the
'attention economy' or providing tools (such as advertising)
that channel traffic to particular locations
values - domain spaces
(eg dot au versus dot com) and restrictions on domain
names (eg trademarks, profanity) embody particular expectations;
the categorisation of directories and blind spots in
search engines are weighted to commerce or against endorsement
of value by a grand cataloguer
is available online and how easily it can be found,
with for example claims that many search tools are biased
towards the 'anglosphere' and thus discriminate against
non-English speakers or against 'alternative' lifestyles
- with perceived biases against searching by users with
visual, motor or cognitive impediments
- making sure that users do not find what they are seeking
(eg adult content filters
or government blocking of dissident and news sites)
or muddying the search (eg action by some record/film
companies to seed P2P exchanges with mislabelled and
broadly, how people conceptualise
cyberspace and navigate it offer insights of value for
the cognitive sciences, whether you are an adherent of
Chomsky's views on language or Schneiderman on computer-human
Online resource identification is also of interest because
of search patterns that some specialists describe as "inept"
and others as "good enough". It is clear that
many users - including those who have been online for
several years - misread navigational clues and conduct
rather shallow searches. Persistence and fine-tuning of
queries in search engines would often produce results
that better meet their stated needs.
A range of academic and industry studies have thus shown
that some people still expect to intuit a search by deconstructing
domain names and that when using search engines the majority
of people/searches (eg a claimed 85% of around a billion
Altavista queries in 1998) do not progress beyond the
first screen of search results.
Web Search: Public Searching of the Web (London:
Springer 2004) by Amanda Spink & Bernard Jansen similarly
reports little evolution over time in search behaviour.
Users typically conduct a handful of simple short searches
with one to two words per search (two searches per session)
and examine only the first page of results.
Much searching through hyperlinks - pointers from one
site to another (or merely from one page to another) is
serendipitous ... going for a random walk. That is not
necessarily a bad thing, as anyone who has contrasted
reliance on a catalogue with grazing the stacks in a library
how we know
Knowledge about search objectives, search strategies and
impediments to successful online navigation come from
a range of sources.
One source - still of major value - is observation of
how people interact with information
devices and questioning about what they were trying
to do, what they achieved and how they felt. That observation
might simply involve a human observer watching a user
or employment of technology that monitors keystrokes or
maps eye movement (one example is research under Poynter
auspices, criticised by Jakob Nielsen here).
Another source is site-specific examination of server
logs, identifying the points of entry (home page or subsidiary
pages?), how users moved through the site (what path did
they follow, how long did they stay) and where/why they
departed (eg because a link was broken or page loading
was too slow). A corollary is examination of logs provided
by site-specific search engines, illustrating what users
were seeking (or merely appeared to be seeking) and what
response they received.
At a broader level insights are provided by logs maintained
by 'whole of web' directory and search engines. Much of
that information is closely guarded as a commercial asset
but some data is commoditised by the operators or third
parties or released as promo, for example the 'hottest
search terms' of the year/quarter.
It complements information collected by metrics companies
through manual questionnaires or logging traffic going
through major gateways (eg selected ISPs) or through selected
personal computers (the user agrees to install software
that reports to the metrics aggregator about navigation
by those lab rats). As we have noted in the more detailed
discussion of the metrics
(and online polling)
industries, extrapolation from those figures is contentious
because of disagreements about the accuracy of data collection
and whether the sample is truly representative of national/global
finding resources online
Users find online resources - including web sites, music
files, embedded graphics, PDF or Excel documents - in
a range of ways that include -
exposure to the resource
hyperlinks from another resource
from an email or chat message
or unstructured use of a search engine
a large scale or specialist directory
a link from an online advertisement
or deconstructing an address
Normalisation of the internet - and opportunities for
exposure offline - has seen increasing understanding of
domain names in the general community (most Australians,
for example, have been exposed to and appear to have some
understanding of an URL) and widespread adoption of coffee
cups, caps, billboards, posters, restaurant menus, invoices,
business cards, newspaper/magazine advertisements, movie
trailers and vehicle signage for pointing people to particular
locations in cyberspace.
There has been surprisingly little research on the ubiquity
and effectiveness of such offline signalling. However
for particular demographics it appears to be more effective
(in terms of basic recognition and cost) than much online
Previous exposure to the resource
Quixotically, the best way for many users to find a resource
is to have been there previously.
Some users enter URLs into a bookmarks tool on a systematic
or random basis (we confess that much of our bookmarking
is unstructured, with this site being used as a surrogate
for a truly coherent set of marks).
Other users do not bookmark, instead relying on their
browser to provide a prompt when they start to enter a
similar address into the location bar or look at the 'history'
of past surfing.
Some browsers and search engines offer 'predictive searching',
suggesting addresses on the basis of information gained
from past searches or past navigation. Prediction is contentious,
given the difficulty of matching past navigation with
other resources (most prediction algorithms are simplistic
and are inhibited by poor information) and claims that
some services are biased towards particular addresses
(eg a site owner has paid to optimise the likelihood that
the user will find that site when conducting a search).
Following hyperlinks from another resource
The web is built around hyperlinks. As this site demonstrates,
one mechanism for effective identification of online resources
is to follow menus and other links from one page to another
within a specific site or to move from that site to external
resources using such links.
Such linkage can be particularly valuable if the referring
site is based on deep understanding of a subject, has
a better awareness of what is available online or is updated
more frequently than most search engines, which as pointed
out earlier in this profile do not cover all of the web
and may have latency periods of around six months.
Reference from an email or chat message
Many people use addresses to which they are pointed through
email or chat messages, either copying the address and
then pasting it into the address bar on their browser
or clicking on a hyperlink within the message. That method
embodies what is arguably the best and worst of searching.
At its best the user is relying on an endorsement of quality
of interest by a colleague, friend or contact with some
At its worst the link appears in spam
... sufficient people make the mistake of responding to
unsolicited bulk messages (clicking on the link or naively
confirming their address through an unsubscribe action)
to make spamming commercially worthwhile and thereby pervasive.
Targeted or unstructured use of a search engine
There is disagreement about non-specialist user reliance
on 'whole of web' and specialist search engines. Some
authorities claim that over 70% of resources are found
using search engines (and that questions of keywords,
for example, are of commercial significance). Others claim
that engines are far less important, with users identifying
sites and individual files through a range of means.
The truth probably lies somewhere in between, given different
experience, objectives and patience of users.
Some clearly start and stop with a single search. Others,
as noted in preceding pages of this profile, may systematically
work through selected entries on a succession of search
screens, conduct multiple searches (sometimes using different
search engines), or use initial results from a search
engine as a point of departure for more extended navigation
through hyperlinks from one site to another.
It is clear that many users - particularly those without
a detailed search strategy or tight objectives - conduct
shallow and unstructured searches, typically entering
a single search term, avoiding 'advanced search' features
(eg boolean text searching and date or other delimiters)
and not progressing beyond the first two screens of search
results. Accurately Interpreting Clickthrough Data
as Implicit Feedback (PDF)
by Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke
& Geri Gay for example comments that 42% of users
clicked the first item in the listing on a search engine
results page (SERP), with 8% of users selecting the second
Grazing a large or specialist directory
As preceding pages have noted, prior to emergence of large-scale
search engines most people relied on directories for identifying
internet resources. Those directories provide categorised
listings of selected web sites or other resources.
Their value for searches is affected by the latency of
the information (most are manually compiled, with delays
in the entry of new information and deletion of superseded
information), the basis of selection and difficulties
encountered by users in navigating through the categorisation.
As with search engines, research suggests that non-specialist
users of the 'whole of web' directories such as Yahoo!
tend not to systematically graze a hierarchy of categories.
Much searching accordingly ceases on the first or second
page. The big directories have accordingly emphasised
'regionalisation', with discrete versions for different
In practice such directories are now significantly less
important for many users, who instead rely on other search
mechanisms identified on this page and on specialist directories
that are often small scale and subject specific (eg a
directory of Mahler manuscripts or papers on the blue-ringed
octopus). Whole of web directories continue to garner
much traffic because they have morphed into broader portals
(eg including webmail access and news) and because they
serve as the default entry pages from many private and
commercial machines, eg in cybercafes.
Following a link from an online advertisement
The online advertising
industry is based on the notion that users
gain some awareness of a product/service through a banner/pop-up
advertisement encountered while surfing or
clicking on such an ad will pass to a discrete site,
an animation or other advertising content.
people report encountering online advertisements or click
through to make ads viable. There is disagreement about
their value as a search mechanism, reflected in changing
user responses to online ads (eg adoption by some demographics
of 'ad washer' or anti-popup software) and the evolution
of 'paid placement' - banners or other strategically positioned
links that may not appear to be ads but if clicked take
the user to a location chosen by the advertiser.
Intuiting or deconstructing an address
Many users attempt to intuit the address of an online
resource by assuming that there is a close match between
a brand/corporate name and the online address, for example
simply adding 'www' and 'com' (or the appropriate ccTLD/gTLD)
on either side of an offline name. The shape of the domain
name system and factors such as trademarks mean that
such a match is not always correct.
Others assume that there will be an appropriate match
between the domain name and the type of service/commodity
of interest (or even the person of interest). Users thus
sometimes resort to a generic domain name (such as books.com,
cars.com or flowers.com), which may lead them to an appropriate
site or merely to a site that features advertising for
an unrelated service/product. Such behaviour was common
during the early years of the web, with some users assuming
that domain names embodied a subject directory. More recently
it has been the basis for development of commercial 'keyword'
portfolios, large collections of sites with subject
names and misspelled brand/subject names that feature
Eszter Hargittai sagely remarks that
most straightforward way of getting to a page is by
having it as the default page on one's browser. Although
the user may change the original default page, it is
often a page specified by the browser's manufacturer,
the user's internet service provider, or the institution
where the machine is operated
efficiency in searching
Online resource identification is a competition for the
user's attention, complicated by -
large number (and volatility)
of sites, documents and other resources
impatience or lack of expertise of many users
willingness of users to accept search results that are
'good enough' rather than true 'best fit'
is clear that many users -
simple searches (eg one or two search terms) and avoid
'advanced' search features such as date delimiters
not systematically work through a large number of search
results (eg move beyond the first two screens of results
from a search engine)
difficulty distinguishing between sponsored and unsponsored
on a handful of sites when conducting research online
(or citing research).
1999 Analysis of a very large web search engine query
by Craig Silverstein, Hannes Marais, Monica Henzinger
and Michael Moricz for example identified around one billion
queries on Altavista, with users sticking with the first
screen of results in 85% of searches and 77% of sessions
involving only contained one query.
Research such as Andy Cockburn & Bruce McKenzie's
on What Do Web Users Do? An Empirical Analysis of Web
Use, published in the International Journal of
Human-Computer Studies, and Google's PageRank
and Beyond: The Science of Search Engine Rankings
(Princeton: Princeton Uni Press 2006) by Amy Langville
& Carl Meyer indicates that -
revisitation is common, with up to 81% of pages being
revisited by a particular user
visits are often of only a few seconds' duration
some users manage revisitation through large lists of
bookmarks those lists are rarely culled and are thus
often out of date
the death of the specialist?
A recurrent meme since the 1960s has been the 'death of
the library' (along with other myths
such as the death of the book and death of the author).
We have also seen hype about the net as a universal library,
a digital repository accessible by all and containing
all the fruits of creativity (along with episodes of Neighbours).
Can we then talk of the death of the librarian? The answer
is clearly no - librarians and other information specialists
are not going to disappear. They will not be replaced
by contemporary search engines or new search technology
based on artificial intelligence.
That is because many specialists have -
in searching (particularly non-public legal, scientific,
financial or other technical databases), underpinned
by a professional ethos that emphasises appropriateness,
comprehensiveness and accuracy
access to firewalled content (inc large-scale bibliographic,
textual and image collections databases that involve
subscription, sessional or item-based payment)
is also because much content is likely to remain offline,
with for example the cost of digitisation
(and disagreement about rights)
inhibiting retrospective capture of many 'historic' texts,
still/moving images and sound recordings.
Notions of 'power searching' and 'digital literacy' are
explored in the following page of this profile.
As points of entry for questions about information seeking
we recommend Elaine Svenonius' The Intellectual Foundation
of Information Organisation (Cambridge: MIT Press
2000), Christine Borgman's From Gutenberg to the Global
Information Infrastructure: Access To Information in the
Networked World (Cambridge: MIT Press 2000), Human
Interaction with Complex Systems: Conceptual Principles
& Design Practice (Hague: Kluwer 1996) by Celestine
Ntuen & Eui Park, Web Search: Multidisciplinary
Perspectives (Berlin: Springer 2008) edited by Amanda
Spink & Michael Zimmer and Preferred Placement:
Knowledge Politics on the Web (Maastricht: Jan van
Eyck Akademie Editions 2000) edited by Richard Rogers.
They are complemented by the Berkshire Encyclopedia
of Human-Computer Interaction (Great Barrington:
Berkshire 2004) edited by William Bainbridge, Donald Case's
Looking for Information: A Survey of Research on Information
Seeking, Needs, and Behavior (New York: Academic
Press 2002) and the exhaustive ACM Human-Computer Interaction
Lara Catledge & James Pitkow's 1995
paper Characterizing browsing strategies in the
World Wide Web, Richard Belew's Finding Out About:
Search Engine Technology From A Cognitive Perspective
(Cambridge: Cambridge Uni Press 2001), Linda Tauscher
& Saul Greenberg's 1997 paper
on Revisitation Patterns in World Wide Web Navigation,
Andrew Treloar's June 2000 paper
on Spinning the Right Path: Investigating the Effectiveness
& Impact of Web Navigation Systems, Andy Cockburn
& Bruce McKenzie's What Do Web Users Do? An Empirical
Analysis of Web Use (PDF),
Lucas Introna & Helen Nissenbaum's 2000 (PDF)
Shaping the Web: Why the Politics of Search Engines
Matters and Erik Selberg's 1999 dissertation
Towards Comprehensive Web Search (PDF)
explore particular issues.
Research by Chun Wei Choo, Brian Detlor & Don Turnbull
may also be of interest. Apart from their Web Work:
Information Seeking & Knowledge Work on the World
Wide Web (New York: Kluwer 2000) we commend the paper
on Information Seeking on the Web, the paper
on Information Seeking on the Web - An Integrated Model
of Browsing & Searching and their First Monday
on Information Seeking on the Web - An Integrated Model
of Browsing & Searching.
Two starting points for understanding issues and processes
are the 1999 paper
on Results & Challenges in Web Search Evaluation
by Hawking, Craswell, Thistlewaite & Harman, and the
by Lawrence & Giles on Accessibility of Information
on the Web.
Annabel Pollock & Andrew Hockley's 1997 What's Wrong with
Internet Searching paper
and Modern Information Retrieval (London:
Longman 1999) by Ricardo Baeza-Yates & Berthier Ribero-Neto
are valuable in understanding retrieval principles and
effectiveness studies. Bernard Jansen's 2000 paper
A Review of Web Searching Studies is a useful literature
next page (wetware)