Search profile: Search Behaviour and Challenges

Ketupa

overview

directories

engines

dark web

images

shopping

people

behaviour

wetware

law

cases

anxieties

landmarks

related
Guide:

Metrics
& Statistics

related
Profiles
& Notes:

Domain
Name
System

Portfolios

Myths:
Everything
is online?

Most
Popular
Search
Terms

Search
Engine
Optimization

Metadata

Colour
Pages

Browsers

search behaviour

This page considers search strategies and online search behaviour, including questions about how users navigate and their assessment of 'good enough' information retrieval.

It covers -

introduction - key questions about searching
how we know - what is the basis for understanding of online searching and who is telling us
finding resources online - basic patterns in searching by non-specialists
efficiency in searching - questions about 'good enough' versus 'best fit'
the death of the specialist? - naive versus expert searching
studies - major industry, government and academic writing about online search behaviour

Questions about search behavior metrics are explored in the separate guide on Metrics & Statistics. Some myths about what is online (and whether that content is accessible) are explored in a supplementary profile.

introduction

Online search behaviour is of interest for a range of reasons.

They include what might be characterised as the politics of information, encompassing questions about -

money - being found on the net has a commercial value (whether direct or indirect) and much electronic commerce is built on visibility, competing for audiences in the 'attention economy' or providing tools (such as advertising) that channel traffic to particular locations
cultural values - domain spaces (eg dot au versus dot com) and restrictions on domain names (eg trademarks, profanity) embody particular expectations; the categorisation of directories and blind spots in search engines are weighted to commerce or against endorsement of value by a grand cataloguer
what is available online and how easily it can be found, with for example claims that many search tools are biased towards the 'anglosphere' and thus discriminate against non-English speakers or against 'alternative' lifestyles
accessibility - with perceived biases against searching by users with visual, motor or cognitive impediments
failure - making sure that users do not find what they are seeking (eg adult content filters or government blocking of dissident and news sites) or muddying the search (eg action by some record/film companies to seed P2P exchanges with mislabelled and corrupted files).

More broadly, how people conceptualise cyberspace and navigate it offer insights of value for the cognitive sciences, whether you are an adherent of Chomsky's views on language or Schneiderman on computer-human interfaces.

Online resource identification is also of interest because of search patterns that some specialists describe as "inept" and others as "good enough". It is clear that many users - including those who have been online for several years - misread navigational clues and conduct rather shallow searches. Persistence and fine-tuning of queries in search engines would often produce results that better meet their stated needs.

A range of academic and industry studies have thus shown that some people still expect to intuit a search by deconstructing domain names and that when using search engines the majority of people/searches (eg a claimed 85% of around a billion Altavista queries in 1998) do not progress beyond the first screen of search results.

Web Search: Public Searching of the Web (London: Springer 2004) by Amanda Spink & Bernard Jansen similarly reports little evolution over time in search behaviour. Users typically conduct a handful of simple short searches with one to two words per search (two searches per session) and examine only the first page of results.

Much searching through hyperlinks - pointers from one site to another (or merely from one page to another) is serendipitous ... going for a random walk. That is not necessarily a bad thing, as anyone who has contrasted reliance on a catalogue with grazing the stacks in a library can attest.

how we know

Knowledge about search objectives, search strategies and impediments to successful online navigation come from a range of sources.

One source - still of major value - is observation of how people interact with information devices and questioning about what they were trying to do, what they achieved and how they felt. That observation might simply involve a human observer watching a user or employment of technology that monitors keystrokes or maps eye movement (one example is research under Poynter auspices, criticised by Jakob Nielsen here).

Another source is site-specific examination of server logs, identifying the points of entry (home page or subsidiary pages?), how users moved through the site (what path did they follow, how long did they stay) and where/why they departed (eg because a link was broken or page loading was too slow). A corollary is examination of logs provided by site-specific search engines, illustrating what users were seeking (or merely appeared to be seeking) and what response they received.

At a broader level insights are provided by logs maintained by 'whole of web' directory and search engines. Much of that information is closely guarded as a commercial asset but some data is commoditised by the operators or third parties or released as promo, for example the 'hottest search terms' of the year/quarter.

It complements information collected by metrics companies through manual questionnaires or logging traffic going through major gateways (eg selected ISPs) or through selected personal computers (the user agrees to install software that reports to the metrics aggregator about navigation by those lab rats). As we have noted in the more detailed discussion of the metrics (and online polling) industries, extrapolation from those figures is contentious because of disagreements about the accuracy of data collection and whether the sample is truly representative of national/global online populations.

finding resources online

Users find online resources - including web sites, music files, embedded graphics, PDF or Excel documents - in a range of ways that include -

offline pointers
previous exposure to the resource
following hyperlinks from another resource
reference from an email or chat message
targeted or unstructured use of a search engine
grazing a large scale or specialist directory
following a link from an online advertisement
intuiting or deconstructing an address

Offline pointers

Normalisation of the internet - and opportunities for exposure offline - has seen increasing understanding of domain names in the general community (most Australians, for example, have been exposed to and appear to have some understanding of an URL) and widespread adoption of coffee cups, caps, billboards, posters, restaurant menus, invoices, business cards, newspaper/magazine advertisements, movie trailers and vehicle signage for pointing people to particular locations in cyberspace.

There has been surprisingly little research on the ubiquity and effectiveness of such offline signalling. However for particular demographics it appears to be more effective (in terms of basic recognition and cost) than much online advertising.

Previous exposure to the resource

Quixotically, the best way for many users to find a resource is to have been there previously.

Some users enter URLs into a bookmarks tool on a systematic or random basis (we confess that much of our bookmarking is unstructured, with this site being used as a surrogate for a truly coherent set of marks).

Other users do not bookmark, instead relying on their browser to provide a prompt when they start to enter a similar address into the location bar or look at the 'history' of past surfing.

Some browsers and search engines offer 'predictive searching', suggesting addresses on the basis of information gained from past searches or past navigation. Prediction is contentious, given the difficulty of matching past navigation with other resources (most prediction algorithms are simplistic and are inhibited by poor information) and claims that some services are biased towards particular addresses (eg a site owner has paid to optimise the likelihood that the user will find that site when conducting a search).

Following hyperlinks from another resource

The web is built around hyperlinks. As this site demonstrates, one mechanism for effective identification of online resources is to follow menus and other links from one page to another within a specific site or to move from that site to external resources using such links.

Such linkage can be particularly valuable if the referring site is based on deep understanding of a subject, has a better awareness of what is available online or is updated more frequently than most search engines, which as pointed out earlier in this profile do not cover all of the web and may have latency periods of around six months.

Reference from an email or chat message

Many people use addresses to which they are pointed through email or chat messages, either copying the address and then pasting it into the address bar on their browser or clicking on a hyperlink within the message. That method embodies what is arguably the best and worst of searching.

At its best the user is relying on an endorsement of quality of interest by a colleague, friend or contact with some authority.

At its worst the link appears in spam ... sufficient people make the mistake of responding to unsolicited bulk messages (clicking on the link or naively confirming their address through an unsubscribe action) to make spamming commercially worthwhile and thereby pervasive.

Targeted or unstructured use of a search engine

There is disagreement about non-specialist user reliance on 'whole of web' and specialist search engines. Some authorities claim that over 70% of resources are found using search engines (and that questions of keywords, for example, are of commercial significance). Others claim that engines are far less important, with users identifying sites and individual files through a range of means.

The truth probably lies somewhere in between, given different experience, objectives and patience of users.

Some clearly start and stop with a single search. Others, as noted in preceding pages of this profile, may systematically work through selected entries on a succession of search screens, conduct multiple searches (sometimes using different search engines), or use initial results from a search engine as a point of departure for more extended navigation through hyperlinks from one site to another.

It is clear that many users - particularly those without a detailed search strategy or tight objectives - conduct shallow and unstructured searches, typically entering a single search term, avoiding 'advanced search' features (eg boolean text searching and date or other delimiters) and not progressing beyond the first two screens of search results. Accurately Interpreting Clickthrough Data as Implicit Feedback (PDF) by Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke & Geri Gay for example comments that 42% of users clicked the first item in the listing on a search engine results page (SERP), with 8% of users selecting the second item.

Grazing a large or specialist directory

As preceding pages have noted, prior to emergence of large-scale search engines most people relied on directories for identifying internet resources. Those directories provide categorised listings of selected web sites or other resources.

Their value for searches is affected by the latency of the information (most are manually compiled, with delays in the entry of new information and deletion of superseded information), the basis of selection and difficulties encountered by users in navigating through the categorisation.

As with search engines, research suggests that non-specialist users of the 'whole of web' directories such as Yahoo! tend not to systematically graze a hierarchy of categories. Much searching accordingly ceases on the first or second page. The big directories have accordingly emphasised 'regionalisation', with discrete versions for different nations/regions

In practice such directories are now significantly less important for many users, who instead rely on other search mechanisms identified on this page and on specialist directories that are often small scale and subject specific (eg a directory of Mahler manuscripts or papers on the blue-ringed octopus). Whole of web directories continue to garner much traffic because they have morphed into broader portals (eg including webmail access and news) and because they serve as the default entry pages from many private and commercial machines, eg in cybercafes.

Following a link from an online advertisement

The online advertising industry is based on the notion that users

will gain some awareness of a product/service through a banner/pop-up advertisement encountered while surfing or
by clicking on such an ad will pass to a discrete site, an animation or other advertising content.

Sufficient people report encountering online advertisements or click through to make ads viable. There is disagreement about their value as a search mechanism, reflected in changing user responses to online ads (eg adoption by some demographics of 'ad washer' or anti-popup software) and the evolution of 'paid placement' - banners or other strategically positioned links that may not appear to be ads but if clicked take the user to a location chosen by the advertiser.

Intuiting or deconstructing an address

Many users attempt to intuit the address of an online resource by assuming that there is a close match between a brand/corporate name and the online address, for example simply adding 'www' and 'com' (or the appropriate ccTLD/gTLD) on either side of an offline name. The shape of the domain name system and factors such as trademarks mean that such a match is not always correct.

Others assume that there will be an appropriate match between the domain name and the type of service/commodity of interest (or even the person of interest). Users thus sometimes resort to a generic domain name (such as books.com, cars.com or flowers.com), which may lead them to an appropriate site or merely to a site that features advertising for an unrelated service/product. Such behaviour was common during the early years of the web, with some users assuming that domain names embodied a subject directory. More recently it has been the basis for development of commercial 'keyword' portfolios, large collections of sites with subject names and misspelled brand/subject names that feature advertising.

Eszter Hargittai sagely remarks that

the most straightforward way of getting to a page is by having it as the default page on one's browser. Although the user may change the original default page, it is often a page specified by the browser's manufacturer, the user's internet service provider, or the institution where the machine is operated

efficiency in searching

Online resource identification is a competition for the user's attention, complicated by -

the large number (and volatility) of sites, documents and other resources
the impatience or lack of expertise of many users
the willingness of users to accept search results that are 'good enough' rather than true 'best fit'

It is clear that many users -

use simple searches (eg one or two search terms) and avoid 'advanced' search features such as date delimiters
do not systematically work through a large number of search results (eg move beyond the first two screens of results from a search engine)
have difficulty distinguishing between sponsored and unsponsored links
rely on a handful of sites when conducting research online (or citing research).

The 1999 Analysis of a very large web search engine query log study by Craig Silverstein, Hannes Marais, Monica Henzinger and Michael Moricz for example identified around one billion queries on Altavista, with users sticking with the first screen of results in 85% of searches and 77% of sessions involving only contained one query.

Research such as Andy Cockburn & Bruce McKenzie's paper (PDF) on What Do Web Users Do? An Empirical Analysis of Web Use, published in the International Journal of Human-Computer Studies, and Google's PageRank and Beyond: The Science of Search Engine Rankings (Princeton: Princeton Uni Press 2006) by Amy Langville & Carl Meyer indicates that -

site revisitation is common, with up to 81% of pages being revisited by a particular user
most visits are often of only a few seconds' duration
although some users manage revisitation through large lists of bookmarks those lists are rarely culled and are thus often out of date

the death of the specialist?

A recurrent meme since the 1960s has been the 'death of the library' (along with other myths such as the death of the book and death of the author). We have also seen hype about the net as a universal library, a digital repository accessible by all and containing all the fruits of creativity (along with episodes of Neighbours).

Can we then talk of the death of the librarian? The answer is clearly no - librarians and other information specialists are not going to disappear. They will not be replaced by contemporary search engines or new search technology based on artificial intelligence.

That is because many specialists have -

expertise in searching (particularly non-public legal, scientific, financial or other technical databases), underpinned by a professional ethos that emphasises appropriateness, comprehensiveness and accuracy
institutional access to firewalled content (inc large-scale bibliographic, textual and image collections databases that involve subscription, sessional or item-based payment)

It is also because much content is likely to remain offline, with for example the cost of digitisation (and disagreement about rights) inhibiting retrospective capture of many 'historic' texts, still/moving images and sound recordings.

Notions of 'power searching' and 'digital literacy' are explored in the following page of this profile.

studies

As points of entry for questions about information seeking we recommend Elaine Svenonius' The Intellectual Foundation of Information Organisation (Cambridge: MIT Press 2000), Christine Borgman's From Gutenberg to the Global Information Infrastructure: Access To Information in the Networked World (Cambridge: MIT Press 2000), Human Interaction with Complex Systems: Conceptual Principles & Design Practice (Hague: Kluwer 1996) by Celestine Ntuen & Eui Park, Web Search: Multidisciplinary Perspectives (Berlin: Springer 2008) edited by Amanda Spink & Michael Zimmer and Preferred Placement: Knowledge Politics on the Web (Maastricht: Jan van Eyck Akademie Editions 2000) edited by Richard Rogers.

They are complemented by the Berkshire Encyclopedia of Human-Computer Interaction (Great Barrington: Berkshire 2004) edited by William Bainbridge, Donald Case's Looking for Information: A Survey of Research on Information Seeking, Needs, and Behavior (New York: Academic Press 2002) and the exhaustive ACM Human-Computer Interaction Bibliography (HCIB).

Lara Catledge & James Pitkow's 1995 paper Characterizing browsing strategies in the World Wide Web, Richard Belew's Finding Out About: Search Engine Technology From A Cognitive Perspective (Cambridge: Cambridge Uni Press 2001), Linda Tauscher & Saul Greenberg's 1997 paper on Revisitation Patterns in World Wide Web Navigation, Andrew Treloar's June 2000 paper on Spinning the Right Path: Investigating the Effectiveness & Impact of Web Navigation Systems, Andy Cockburn & Bruce McKenzie's What Do Web Users Do? An Empirical Analysis of Web Use (PDF), Lucas Introna & Helen Nissenbaum's 2000 (PDF) Shaping the Web: Why the Politics of Search Engines Matters and Erik Selberg's 1999 dissertation Towards Comprehensive Web Search (PDF) explore particular issues.

Research by Chun Wei Choo, Brian Detlor & Don Turnbull may also be of interest. Apart from their Web Work: Information Seeking & Knowledge Work on the World Wide Web (New York: Kluwer 2000) we commend the paper on Information Seeking on the Web, the paper on Information Seeking on the Web - An Integrated Model of Browsing & Searching and their First Monday article on Information Seeking on the Web - An Integrated Model of Browsing & Searching.

Two starting points for understanding issues and processes are the 1999 paper on Results & Challenges in Web Search Evaluation by Hawking, Craswell, Thistlewaite & Harman, and the 1999 study by Lawrence & Giles on Accessibility of Information on the Web.

Annabel Pollock & Andrew Hockley's 1997 What's Wrong with Internet Searching paper and Modern Information Retrieval (London: Longman 1999) by Ricardo Baeza-Yates & Berthier Ribero-Neto are valuable in understanding retrieval principles and effectiveness studies. Bernard Jansen's 2000 paper A Review of Web Searching Studies is a useful literature review.

next page (wetware)