dogs in space
rich & hip
it's all there
it's all there?
page explores myths about online access to what some writers
have characterised as the "information cornucopia"
or global digital library: claims that everything you
want to know is online, that you can easily find it and
that you will be able to do so in future.
It covers -
Brewster Kahle's 2001 Public Access to Digital Material
identified universal online access to content as the
"epic opportunity of our digital age", claiming
technology has reached the point where scanning all
books, digitizing all audio recordings, downloading
all websites, and recording the output of all TV and
radio stations is not only feasible but less costly
than buying and storing the physical versions.
A year later Business Week ran
with the spin, claiming that Kahle's Internet Archive
collection of 10 billion pages, including Internet sites,
movies, and Usenet postings
five times larger than the amount of information at
the Library of Congress
a single copy of everything that's on the Net -- equal
to 15,000 copies of Encyclopedia Britannica
-- is added to the archive every two months.
that were true, since the Archive is in fact be far more
It does not archive all sites. It does
not archive all pages of all sites and
it archives erratically.
At a less rarified level there are four basic myths about
information in cyberspace -
all online content can be found
all online content can be accessed (and will be accessible
access is resulting in information overload.
The explosion of web sites,
high numbers of results displayed by search
engines, ready access to some contemporary music through
filesharing and casual references to global digital libraries
have encouraged a belief that "everything" is
online ... or could be available via the internet through
the efforts of volunteers or through the removal of impediments
such as intellectual property.
That belief is at best naive. Although there are several
million sites on the web (with an unknown number of pages),
much of the content is corporate or personal and ephemeral.
As of mid-2003 a majority of pages are probably in English:
are barely represented in terms of readership or authorship
(eg there is little Lao, Inuit, Bantu or Amharic content).
What text is online is patchy in the extreme. Standard
works from the Latin and Greek classics are online (albeit
often in superseded editions and translations) but there
is little Provencal, Persian, Chinese, Khmer or Aramaic.
More recent literature is sparse: umpteen copies of pronouncements
by Gilmore and Barlow but no Patrick White, Christina
Stead, David Malouf, Heimito von Doderer or Robert Musil.
There is little Proust, less Mann (Heinrich, Klaus or
Publishers such as Gale are undertaking large-scale digitisation
programs (eg Gale's 20 million page 150,000 titles The
18th Century literature project).
That activity is, of course, on a commercial basis and
- as in past microfilm or CD-ROM projects - access to
the text is generally restricted to academic ghettoes.
Digitisation of archival content from newspapers
and journals is underway but again, much of that content
won't be freely available. As noted in discussing electronic
publishing, many current serials are online ... but protected
by firewalls for access on a subscription, sessional or
per item payment. Few online newspapers contain all the
content that is featured in their print versions.
Biography and critical literature prior to the 1980s is
equally sparse and, given the priorities of initiatives
such as the Gutenberg and Bartleby projects, is likely
to remain so. Do not expect to find a standard edition
of Lukacs, Adorno, Kojeve or Bakhtin. Only scraps by historians
such as Namier, Bloch, Kehr, Febvre, Michelet or Matthiesen
are on the web. In summary, most of the Library of Congress,
National Library of Australia or even mid-range university
or community library is not on the web.
What of music? A rarely-remarked feature is the almost
total absence of scores: the net provides access to recordings
rather than notation. With some exceptions the classical
repertoire is largely absent: little Machaut, Ives, Zemlinsky,
As uptake of broadband increases, access to video content
is growing. At the moment most video on the web (and downloaded
through filesharing) emanates from the adult
content sector. Neither Hollywood nor national film
industries plan to release their libraries (including
early b&w silent films) onto the web. The BBC's proposals
to place much of its audiovisual library online is an
exception with little enthusiasm from commercial and public
is it readily identified?
Much content is freely available on the net. However,
ready identification of that content often poses particular
challenges. In discussing internet metrics we've noted
reports that suggest most traffic goes to a small proportion
of sites (the 'winner takes all' model): many potential
users simply don't find the content that is available.
For practical purposes that content does not exist.
Developments in enhanced search engines, metadata and
other identification mechanisms are arguably not keeping
pace with the growth of the net, the volatility of much
online content and the resistance of many users to unstructured
or 'naive' retrieval. It is clear that many users are
content to settle for second or third best and that that
many are overwhelmed by the task of sifting through exhaustive
Even the major engines don't cover all of the public web;
few cover much of the 'deep' web, ie content that's behind
firewalls or is generated dynamically from databases rather
than static web pages that are readily spidered.
is it accessible?
A pernicious myth is the notion that most content
can be readily accessed. In considering digital divides,
usability and other
questions we have suggested that access to the net is
quite uneven. Much of the web is 'dark', either because
content is held behind firewalls (no password or no credit
card number = no access) or because site operators have
disregarded usability principles.
Within advanced economies substantial parts of the population
do not have ready access. That is because they face physical
challenges (eg poor sight and motor difficulties), because
the infrastructure is not available or because they simply
can't afford the ongoing investment in a recent computer
and broadband charges.
Such impediments in Australia and New Zealand are more
critical in other parts of the world, where as we have
noted over a billion people do not have ready access to
electricity (and several million depend on dried cow dung
and straw for warmth and cooking). Hype from Microsoft,
Cisco and MIT about breaking down digital divides through
wireless networks seems somewhat displaced when the cost
of a personal computer is several times the annual income
of the average family in central Africa or Bangladesh.
We have argued that notions of the digital divide encompass
deficits in skills, expectations and the broader economic
Charles Kenny of the World Bank for example comments that
of education is a major barrier to productive Internet
use .... In Ethiopia, 98 percent of Internet users in
1998 had a university degree, yet 64.5 percent of the
overall population is illiterate. Worldwide, most people
living on $1 a day are illiterate. Further, they usually
speak a minority language in their own country - few
speak a major global language. For example, about 17
million people in Nigeria speak Igbo. My search for
Web pages in Igbo turned up only five sites: a translation
of the Universal Declaration of Human Rights, a translation
of a document called 'The Four Spiritual Laws' (theological
provenance undetermined), a translation of the food
pyramid, a two-page Igbo phrase book, and a prayer manual.
There isn't an Igbo translation service on the Web,
so an Igbo speaker would be limited to these five. None
involved sound or video, so the illiterate Igbo speaker
would gain nothing. Bridging the gaps in language and
technical skills as well as basic literacy will be difficult,
considering the small per-student spending available
in the poorest countries' primary schools, where the
discretionary budget per student is as little as $5
rightly dismisses hype about pervasive benefits from e-commerce
by noting that
if poor people are lucky enough to be literate and conversant
in a major world language, their use of the Web for
activities such as e-commerce is likely to be limited
by their lack of credit cards, not to mention the challenge
of persuading FedEx and UPS to start delivery services
in their neighborhoods. Limitations in relevant content
and ability to use that content perhaps best explain
why only 2.2 percent of India's Internet users have
ever engaged in buying or selling over the Web.
lack of the fantastic green plastic also precludes use
outside libraries of paid access sites.
is it accurate?
Notions of the web as a well-ordered and comprehensive
free library (whose librarians provide quality control
in the acquisiton of content and the systematic weeding
out of superseded content) are misplaced. Online publication
is not a guarantee of accuracy.
A more effective metaphor instead is the net as the 'marketplace
of ideas', in which everyone is free to offer content
and in which truth eventually triumphs over ignorance
or deception. Regrettably, in that marketplace lies are
often more seductive or simply easier to find. Much of
the factual information on the net is false or has become
so through the passage of time.
The self-referential nature of much online content creation
- authors appropriating online content without referring
to offline sources and the echo-chamber effect of much
blogging and wiki,
exacerbated by the 'winner takes all' phenomenon - means
that inaccuracies can gain wide circulation. That is of
particular concern for medical
sites. It is also of concern regarding sites with a historical
or technical reference function (one reason why this site
features a range of online sources and references to offline
writing). It is a basis for skepticism about arguments
that defamation online
is not a major problem, as the defamed can supposedly
'out-publish' the falsehoods in a triumph of free
One response is the development of a digital information
literacy, with readers having appropriate expectations
about what is found online, skills in assessing accuracy
and a capability (and commitment) to checking information
found on the net.
next page (the