Myths of Cyberspace: It's All There?

Ketupa

overview

exceptionalism

commons

dogs in space

rich & hip

borders

e-cargo cults

community

home alone

red lights

it's all there

inattention

overload

it's all there?

This page explores myths about online access to what some writers have characterised as the "information cornucopia" or global digital library: claims that everything you want to know is online, that you can easily find it and that you will be able to do so in future.

It covers -

introduction
what's online
identification
accessibility
accuracy

introduction

Brewster Kahle's 2001 Public Access to Digital Material article identified universal online access to content as the "epic opportunity of our digital age", claiming that

technology has reached the point where scanning all books, digitizing all audio recordings, downloading all websites, and recording the output of all TV and radio stations is not only feasible but less costly than buying and storing the physical versions.

A year later Business Week ran with the spin, claiming that Kahle's Internet Archive is -

a collection of 10 billion pages, including Internet sites, movies, and Usenet postings five times larger than the amount of information at the Library of Congress

and that

Today, a single copy of everything that's on the Net -- equal to 15,000 copies of Encyclopedia Britannica -- is added to the archive every two months.

Would that were true, since the Archive is in fact be far more selective.

It does not archive all sites. It does not archive all pages of all sites and it archives erratically.

At a less rarified level there are four basic myths about information in cyberspace -

everything is online
all online content can be found
all online content can be accessed (and will be accessible in future)
access is resulting in information overload.

     what's online?

The explosion of web sites, high numbers of results displayed by search engines, ready access to some contemporary music through filesharing and casual references to global digital libraries have encouraged a belief that "everything" is online ... or could be available via the internet through the efforts of volunteers or through the removal of impediments such as intellectual property.

That belief is at best naive. Although there are several million sites on the web (with an unknown number of pages), much of the content is corporate or personal and ephemeral. As of mid-2003 a majority of pages are probably in English: some languages are barely represented in terms of readership or authorship (eg there is little Lao, Inuit, Bantu or Amharic content).

What text is online is patchy in the extreme. Standard works from the Latin and Greek classics are online (albeit often in superseded editions and translations) but there is little Provencal, Persian, Chinese, Khmer or Aramaic. More recent literature is sparse: umpteen copies of pronouncements by Gilmore and Barlow but no Patrick White, Christina Stead, David Malouf, Heimito von Doderer or Robert Musil. There is little Proust, less Mann (Heinrich, Klaus or Thomas).

Publishers such as Gale are undertaking large-scale digitisation programs (eg Gale's 20 million page 150,000 titles The 18th Century literature project). That activity is, of course, on a commercial basis and - as in past microfilm or CD-ROM projects - access to the text is generally restricted to academic ghettoes.

Digitisation of archival content from newspapers and journals is underway but again, much of that content won't be freely available. As noted in discussing electronic publishing, many current serials are online ... but protected by firewalls for access on a subscription, sessional or per item payment. Few online newspapers contain all the content that is featured in their print versions.

Biography and critical literature prior to the 1980s is equally sparse and, given the priorities of initiatives such as the Gutenberg and Bartleby projects, is likely to remain so. Do not expect to find a standard edition of Lukacs, Adorno, Kojeve or Bakhtin. Only scraps by historians such as Namier, Bloch, Kehr, Febvre, Michelet or Matthiesen are on the web. In summary, most of the Library of Congress, National Library of Australia or even mid-range university or community library is not on the web.

What of music? A rarely-remarked feature is the almost total absence of scores: the net provides access to recordings rather than notation. With some exceptions the classical repertoire is largely absent: little Machaut, Ives, Zemlinsky, Pergolesi, Gesualdo.

As uptake of broadband increases, access to video content is growing. At the moment most video on the web (and downloaded through filesharing) emanates from the adult content sector. Neither Hollywood nor national film industries plan to release their libraries (including early b&w silent films) onto the web. The BBC's proposals to place much of its audiovisual library online is an exception with little enthusiasm from commercial and public sector peers.

     is it readily identified?

Much content is freely available on the net. However, ready identification of that content often poses particular challenges. In discussing internet metrics we've noted reports that suggest most traffic goes to a small proportion of sites (the 'winner takes all' model): many potential users simply don't find the content that is available. For practical purposes that content does not exist.

Developments in enhanced search engines, metadata and other identification mechanisms are arguably not keeping pace with the growth of the net, the volatility of much online content and the resistance of many users to unstructured or 'naive' retrieval. It is clear that many users are content to settle for second or third best and that that many are overwhelmed by the task of sifting through exhaustive search results.

Even the major engines don't cover all of the public web; few cover much of the 'deep' web, ie content that's behind firewalls or is generated dynamically from databases rather than static web pages that are readily spidered.

     is it accessible?

A pernicious myth is the notion that most content can be readily accessed. In considering digital divides, usability and other questions we have suggested that access to the net is quite uneven. Much of the web is 'dark', either because content is held behind firewalls (no password or no credit card number = no access) or because site operators have disregarded usability principles.

Within advanced economies substantial parts of the population do not have ready access. That is because they face physical challenges (eg poor sight and motor difficulties), because the infrastructure is not available or because they simply can't afford the ongoing investment in a recent computer and broadband charges.

Such impediments in Australia and New Zealand are more critical in other parts of the world, where as we have noted over a billion people do not have ready access to electricity (and several million depend on dried cow dung and straw for warmth and cooking). Hype from Microsoft, Cisco and MIT about breaking down digital divides through wireless networks seems somewhat displaced when the cost of a personal computer is several times the annual income of the average family in central Africa or Bangladesh.

We have argued that notions of the digital divide encompass deficits in skills, expectations and the broader economic environment.

Charles Kenny of the World Bank for example comments that

Lack of education is a major barrier to productive Internet use .... In Ethiopia, 98 percent of Internet users in 1998 had a university degree, yet 64.5 percent of the overall population is illiterate. Worldwide, most people living on $1 a day are illiterate. Further, they usually speak a minority language in their own country - few speak a major global language. For example, about 17 million people in Nigeria speak Igbo. My search for Web pages in Igbo turned up only five sites: a translation of the Universal Declaration of Human Rights, a translation of a document called 'The Four Spiritual Laws' (theological provenance undetermined), a translation of the food pyramid, a two-page Igbo phrase book, and a prayer manual. There isn't an Igbo translation service on the Web, so an Igbo speaker would be limited to these five. None involved sound or video, so the illiterate Igbo speaker would gain nothing. Bridging the gaps in language and technical skills as well as basic literacy will be difficult, considering the small per-student spending available in the poorest countries' primary schools, where the discretionary budget per student is as little as $5 a year.

Kenny rightly dismisses hype about pervasive benefits from e-commerce by noting that

even if poor people are lucky enough to be literate and conversant in a major world language, their use of the Web for activities such as e-commerce is likely to be limited by their lack of credit cards, not to mention the challenge of persuading FedEx and UPS to start delivery services in their neighborhoods. Limitations in relevant content and ability to use that content perhaps best explain why only 2.2 percent of India's Internet users have ever engaged in buying or selling over the Web.

That lack of the fantastic green plastic also precludes use outside libraries of paid access sites.

is it accurate?

Notions of the web as a well-ordered and comprehensive free library (whose librarians provide quality control in the acquisiton of content and the systematic weeding out of superseded content) are misplaced. Online publication is not a guarantee of accuracy.

A more effective metaphor instead is the net as the 'marketplace of ideas', in which everyone is free to offer content and in which truth eventually triumphs over ignorance or deception. Regrettably, in that marketplace lies are often more seductive or simply easier to find. Much of the factual information on the net is false or has become so through the passage of time.

The self-referential nature of much online content creation - authors appropriating online content without referring to offline sources and the echo-chamber effect of much blogging and wiki, exacerbated by the 'winner takes all' phenomenon - means that inaccuracies can gain wide circulation. That is of particular concern for medical sites. It is also of concern regarding sites with a historical or technical reference function (one reason why this site features a range of online sources and references to offline writing). It is a basis for skepticism about arguments that defamation online is not a major problem, as the defamed can supposedly 'out-publish' the falsehoods in a triumph of free speech.

One response is the development of a digital information literacy, with readers having appropriate expectations about what is found online, skills in assessing accuracy and a capability (and commitment) to checking information found on the net.

next page (the inattention economy)