Internet Metrics & Statistics Guide: Visualisation

Ketupa

overview

domains

content

population

traffic

navigation

demographics

methods

teledensity

ranks

divides

jargon

sources

lies & spin

industry

visualisation

analytics

pageviews

visualisation

This page considers visualisation of the net - techniques for mapping the web (eg by highlighting links between different sites or the geography of servers and traffic flows) and other parts of the net.

It covers -

introduction
infrastructure
traffic flows
link analysis and webometrics
search
mapping wired social networks and participation
studies and resources

introduction

Although cyberspace is famously "everywhere and nowhere" it is feasible to identify -

the physical infrastructure that underpins the net, for example, servers, cables and wireless access points
the volume and direction of email and other traffic, as some parts of cyberspace are more visited than others and online populations are clustered rather than evenly spread across the globe
the relationship between sites (and between documents)
the relationship between online individuals, organisations and other entities

and thence construct tables, maps or graphs. Visualisation of the four layers of the net can be characterised as follows -

layer	volatility	identification	scale
1. physical infrastructure	low	easy	massive
2. logical
3. applications
4. content	high	hard	granular

infrastructure

As we have noted in discussing Networks & the Global Information Infrastructure, the topography of much of the internet's infrastructure reflects the construction of communication networks over the past two hundred years (with for example some major fibre optic cable laid along rights of way provided by railway and canal operators).

It also reflects the broader pattern of human settlement, with internet hosts

generally clustered in/around major cities (often in large-scale 'server farms' or 'web hotels' located in redeveloped areas on the fringe of central business districts)
predominantly located in a handful of nations (the majority of sites in some countries are hosted in New York) and a few regions within those nations

Matthew Zook's 1998 paper The Web of Consumption: The Spatial Organization of the Internet Industry in the US - consistent with indications from Australia and other nations - provides a striking demonstration of how the supposedly 'spaceless' internet industry is clustering in specific geographical locations, in particular New York, LA and San Francisco.

The notion of 'critical information infrastructure' has meant that detailed information about the configuration of the GII and each country's national information infrastructure is difficult to obtain. That is partly inertia - telecommunication companies and other service providers have few commercial incentives to publish detailed maps of their networks (with Australia's discussed here) - and partly a sense that denying information denies opportunities for terrorism. Most publicly available maps are thus decidedly schematic.

The Cybergeography.org site features a selection of types of maps relating to infrastructure cable and satellite links, wireless and other infrastructure. The Wireless note elsewhere on this site identifies maps of wireless access points in Australia and New Zealand.

traffic flows

Cyberspace can also be characterised as information flows, with tables and maps for example identifying

where information is going online (eg the clustering of domain name registrations and hosts)
the number of emails sent via particular ISPs
overall traffic across the GII (eg bytes from the US to Australia over specific links versus bytes from Australia to the US)
the quantity of spam received by particular ISPs or organisations
the location of participants in local and international newsgroups

link analysis and webometrics

Link analysis tools include

TouchGraph's GoogleBrowser, Amazon.com browser and LiveJournal browser
Pajek
lexiurl
socscibot
UCINET

Salient studies include the 2003 Hyperlink Analyses of the World Wide Web: A review paper by Han Woo Park & Mike Thelwall and the latter's Link Analysis: An Information Science Approach site.

IssueCrawler is server-side crawler, analysis and visualisation software that provides co-link analysis of hyperlinks, locates densely interlinked networks of organisations (pages) concerned with a particular issue and visualises 'issue networks' in circles and clusters.

     search

Studies indicate that some users envisage the web as a library, with sites, documents and other resources being identified and accessed through a directory (the basis for the early success of portals such as Yahoo!) or a text-based search engine. There is increasing interest, especially among experienced specialist users, in using graphic representations for the display of search results and interative maps in searching. One example is the KartOO engine.

     mapping wired social networks and participation

Cyberspace is also a manifestation of social relationships and activity, which can be represented in different ways.

It is clear, for example, that most posts on many newsgroups come from a handful of subscribers - papers by Arnold, Williams & Slater for example demonstrate that over 80% of posts on the Australian DNS List and LINK List come from under 5% of subscribers.

It is also clear, as might be expected from studies of patterns in offline citations, that links between individuals and documents are not evenly distributed. Some documents are more cited (or linked to) than others; some bloggers are more linked to (and thus more likely to be read and thus linked to) than others.

A range of social network analysis tools have been developed since the 1980s for use on a stand-alone basis or in mapping online relationships by for example crawling the web. They include NetDraw, Pajek and UCINet.

Background is provided in Social Network Analysis (Cambridge: Cambridge Uni Press 1994) by Stanley Wasserman & Katherine Faust, Models & Methods in Social Network Analysis (Cambridge Uni Press, forthcoming) edited by Wasserman & Peter Carrington and in discussion elsewhere on this site regarding social networks, such as Hab Woo Park's 2003 'What is hyperlink network analysis?: New method for the study of social structure on the Web' in 25 Connections 1 (PDF).

     studies and resources

Starting points for thinking about visualisation challenges and mechanisms are the work of Edward Tufte, discussed in the Design Guide elsewhere on this site, and How Maps Work: Representation, Visualization and Design (New York: Guilford 1995) by Alan MacEachren.

Cybergeographer Martin Dodge offers an excellent introduction to mapping traffic and co-authored the outstanding Mapping Cyberspace (London: Routledge 2000), which has a companion site.

The Electronic Space Project (Espace) at Michigan State University complements the Geography project. We recommend Information Tectonics: Space, Place & Technology In An Electronic Age (New York: Wiley 2000), a collection of papers edited by Mark Wilson & Kenneth Corey and the associated maps of hosts and access to telecommunications.

The Geography of Cyberspace project supplies extensive maps and diagrams that represent internet traffic, the geographical distribution of hosts and other features of cyberspace. It also offers a useful bibliography.

A starting point in considering traffic analysis - of significance for network design and peering negotiations - is the 2004 Visualization Challenges in Internet Traffic Research (PDF) by Barbara Gonzalez-Carvalo, Felix Hernandez-Campos, J. S. Marron & Cheolwoo Park. For other pointers to the direction of traffic and growth patterns why not explore the Hoffman & Novak research from Vanderbilt University about the web in 1995 and the links on Hal Varian's site.

Netgraphs provides pointers for statistics buffs. The Cooperative Association for Internet Data Analysis (CAIDA) has a large range of papers and reports on bandwidth, transfer pricing and the nitty gritty of traffic between telcos and ISPs.

Zook's The Web of Consumption paper is complemented by NY University's project on information technology and the future of the urban environment, in particular the mapping.

next page (your analytics)