Search profile: Images and Sounds

Ketupa

overview

directories

engines

dark web

images

shopping

people

behaviour

wetware

law

cases

anxieties

landmarks

related
Profiles:

Metadata

Optimisation

Search
Terms

Colour
Pages

Browsers

images and sounds

This page considers searching of audio, video and still images.

It covers -

introduction
identification
sorting
futures

introduction

Searching of online content, for the moment, remains resolutely text-based despite the flood of personal snaps, sound recordings, archival film, contemporary video and maps pouring onto the web through institutional/corporate sites and services such as Flickr or YouTube.

It is likely that the amount of such data, in terms of bytes, already accounts for over half of the bright and dark web. It will continue to increase as individuals and organisations unlock their archives or merely assume that peers are interested in video blogs. One indication of growth is Google's announcement in February 2005 (predating global uptake of YouTube) that its cache of the web had reached over a billion images, up from some 880 million in February 2004. Another has been announcement that the BBC and other major moving image owners will place much of their collections online.

Searching is text-based because existing web search engines are not good at -

comparing non-text content, in particular comparing items that are not exact copies
making sense of non-text content, eg determining that a picture is upside down

That means the 'universal library' or 'global jukebox' remains a myth: there is a lot of content online but if you cannot find it, for practical purposes it does not exist.

Searching is text-based because the major search engines rely on text associated with an image or an audio/video recording, rather than independent interpretation of that content. That text enables identification of the images and sounds. It also underpins much of the sorting by the engines of that content.

identification

Basic identification of content by whole-of-web search engines such as Google involves determination of the file type: the .gif, .jpg, .html, .wmv or other suffix that forms part of an individual file's name. Without that suffix the file will not be recognised.

In spidering the web (or a particular set of files, which might be on an intranet) search engines typically parse file directories and individual files, such as web pages that comprise a mix of html code and 'embedded' graphics.

The content or other attributes (eg date of creation) of digital audio, video or still image files can sometimes be inferred from the title of the individual file, from any 'alt' tag intended to enhance the file's accessibility or more broadly from the domain name and from the type of links pointing to that domain (eg to an adult content site). Much of the time the file title is not useful, either being generic (eg catpicture.jpg or header.gif) or meaningless from the engine's perspective (eg 01457.gif or 3257.mp3). Many images do not have 'Alt' tags; those tags are often non-descriptive or generic.

Some files do feature metadata, which might include detailed DRM information. Only a small proportion of all image/audio files on the net feature useful data, in particular DRM tags (which are thus not a fundamental resource for the major engines but may be used by specialist art history or other cultural database engines).

Most engines accordingly identify still images through associated text. That text might be the name of a web page and the words within a web page (engines typically assume that the words nearest to an image, particularly what appears to be a caption, relate to that image). The text might instead be the wording in a link to the still/moving image or audio file.

Contributors of files to services such as YouTube are encouraged to include concise keywords during the submission process. Those keywords - metadata - may not adequately describe the particular file, for example because they are mispelt, confusing or do not provide enough information for an appropriately granular search where there are a large number of files with similar content or similar keywords.

Outside of those cues the major search engines do not have the capacity to consistently differentiate between images (and between audio files), for example determining purely on the pixels that one image represents a pear, another represents a penguin and a third represents the more intimate parts of a porn star.

sorting

Having identified audiovisual files, how do the major search engines sort them (eg rank those files in a way that presents the results in a way that most closely reflects the user's search terms or other criteria such as date)?

As with the discussion earlier in this profile, most search engine algorithms are proprietary and are refined to reflect research and feedback from users. As a result there is some agreement about basic principles but specifics of current and past sorting mechanisms for the major engines are unavailable. Some sense of those principles is provided by the industry and academic studies cited elsewhere in this profile.

Much image searching is implicitly a subset of standard searches of web pages, with the engine for example identifying all image-bearing pages that match the user's search criteria. Image-specific searching in major engines such as Google trawls the particular engine's cache of web pages and then presents thumbnail images from that cache in a rank that usually reflects factors such as -

the ranking of the page with which the image is associated
whether users have clicked on the thumbnail during identical past image searches
any weighting given to text that is likely to be directly associated with the image, eg whether it has a relevant caption, whether the image was 'embedded' close to multiple instances of the search term in the page

Searches of moving image collections, such as YouTube, typically rank the files by the closeness of the match between the user's search request and the terms supplied by the person who uploaded each file onto the collection. Other rankings (for example by date of upload, place of upload, most viewed, most commented or most linked-to) are possible; that ranking is mechanistic and does not involve the engine interpreting the content of that video.

The same mechanisms are used to rank audio file collections, with engines leveraging external cues rather than sorting on the basis that a particular performance 'sounds like' David Bowie rather than Iggy Popp or Enrico Caruso.

futures

A less mechanistic identification and ranking of non-text files is often characterised as the "next frontier" in search or even in artificial intelligence, with ambitious claims being made for imminent breakthroughs in content analysis or for application of neural networks and other technologies that at the moment are just buzzwords in search of a research grant.

What is often characterised as 'pattern recognition' has attracted major commercial, military and academic attention. That is unsurprising given -

awareness that interpretation of images is a key for robotics
arguments that the algorithms and hardware used in image and audio recognition will be associated with, if not drive, breakthroughs in artificial intelligence
perceptions that problems associated with recognition are conceptually demanding (and thus attract the best researchers) and if solved are likely to be extremely lucrative (thus attracting venture capital or other investment)
awareness that matching and interpreting images is central to some forms of biometrics and satellite-based or other geospatial intelligence systems

Some researchers have placed their faith in a combination of falling hardware costs and technologies such as SMIL, with software for example automatically 'listening' to the soundtrack of a video, converting that sound into text (ie a script synchronised to the particular image) and thereby being able to index a film without substantial human intervention.

In practice current applications centre on 'brute force' comparisons between a known audio or video recording and one that is suspected to be a copy. That is attractive for intellectual property rights owners concerned to place the copyright or trademark genie back into the digital bottle. Some enthusiasts have envisaged enforcement agents able to automatically and comprehensively -

trawl the web in search of copies of musical performances, movies, television programs or even trademarks
determine whether those copies or uses were authorised
instruct associates (eg social software services) and third parties (eg ISPs, ICHs, search engines) to delete or merely block unauthorised copies/uses

Apart from fundamental technical challenges, that vision conflicts with a range of commercial and legal realities.

Other enthusiasts have suggested that software can or shortly will be able to readily interpret the content of still/moving images without external cues and without the preliminary selection (and analysis of standard data in a highly circumscribed field) used by ANPR systems. Competing promoters of content filters have for example recurrently referred to artificial intelligence in claiming that their products can very accurately discriminate between adult content and youth-friendly pictures on the basis of shape, colour (in particular "skin tones") or luminosity. One vendor accordingly claims to distinguish with 98% accuracy between backgrounds and foreground, identify faces, differentiate between a beach scene and pornography, and between a puppydog's pink tummy, a peach and a naughty Miss America.

As Rea, Lacey, Lambe & Dayhot in 'Multimodal Periodicity Analysis for Illicit Content Detection in Videos' (PDF) and James Wang in Integrated Region-Based Image Retrieval (New York: Kluwer 2006) note, that task is challenging, particularly for video rather than still images and on a real time basis. As a search mechanism such technologies accordingly offer exclusion - accurate or otherwise - rather than a useful way of finding all still images of peaches (by Cézanne or otherwise) and any video which features plums on the kitchen table.

next page (shopping)