overview
directories
engines
dark web
images
shopping
people
behaviour
wetware
law
cases
anxieties
landmarks

related
Profiles:
Metadata
Optimisation
Search
Terms
Colour
Pages
Browsers
|
images and sounds
This
page considers searching of audio, video and still images.
It covers -
introduction
Searching of online content, for the moment, remains resolutely
text-based despite the flood of personal snaps, sound
recordings, archival film, contemporary video and maps
pouring onto the web through institutional/corporate sites
and services such as Flickr or YouTube.
It is likely that the amount of such data,
in terms of bytes, already accounts for over half of the
bright and dark web.
It will continue to increase as individuals and organisations
unlock their archives
or merely assume that peers are interested in video
blogs. One indication of growth is Google's announcement
in February 2005 (predating global uptake of YouTube)
that its cache of the web had reached over a billion images,
up from some 880 million in February 2004. Another has
been announcement that the BBC and other major moving
image owners will place much of their collections online.
Searching is text-based because existing web search engines
are not good at -
-
comparing non-text content, in particular comparing
items that are not exact copies
- making
sense of non-text content, eg determining that a picture
is upside down
That
means the 'universal library' or 'global jukebox' remains
a myth: there is a lot
of content online but if you cannot find it, for practical
purposes it does not exist.
Searching is text-based because the major search engines
rely on text associated with an image or an audio/video
recording, rather than independent interpretation of that
content. That text enables identification of the images
and sounds. It also underpins much of the sorting by the
engines of that content.
identification
Basic identification of content by whole-of-web search
engines such as Google involves determination of the file
type: the .gif, .jpg, .html, .wmv or other suffix that
forms part of an individual file's name. Without that
suffix the file will not be recognised.
In spidering the web (or a particular set of files, which
might be on an intranet) search engines typically parse
file directories and individual files, such as web pages
that comprise a mix of html code and 'embedded' graphics.
The content or other attributes (eg date of creation)
of digital audio, video or still image files can sometimes
be inferred from the title of the individual file, from
any 'alt' tag intended to enhance the file's accessibility
or more broadly from the domain name and from the type
of links pointing to that domain (eg to an adult
content site). Much of the time the file title is
not useful, either being generic (eg catpicture.jpg or
header.gif) or meaningless from the engine's perspective
(eg 01457.gif or 3257.mp3). Many images do not have 'Alt'
tags; those tags are often non-descriptive or generic.
Some files do feature metadata,
which might include detailed DRM
information. Only a small proportion of all image/audio
files on the net feature useful data, in particular DRM
tags (which are thus not a fundamental resource for the
major engines but may be used by specialist art history
or other cultural database engines).
Most engines accordingly identify still images through
associated text. That text might be the name of a web
page and the words within a web page (engines typically
assume that the words nearest to an image, particularly
what appears to be a caption, relate to that image). The
text might instead be the wording in a link to the still/moving
image or audio file.
Contributors of files to services such as YouTube are
encouraged to include concise keywords during the submission
process. Those keywords - metadata - may not adequately
describe the particular file, for example because they
are mispelt, confusing or do not provide enough information
for an appropriately granular search where there are a
large number of files with similar content or similar
keywords.
Outside of those cues the major search engines do not
have the capacity to consistently differentiate between
images (and between audio files), for example determining
purely on the pixels that one image represents a pear,
another represents a penguin and a third represents the
more intimate parts of a porn star.
sorting
Having identified audiovisual files, how do the major
search engines sort them (eg rank those files in a way
that presents the results in a way that most closely reflects
the user's search terms or other criteria such as date)?
As with the discussion earlier in this profile, most search
engine algorithms are proprietary and are refined to reflect
research and feedback from users. As a result there is
some agreement about basic principles but specifics of
current and past sorting mechanisms for the major engines
are unavailable. Some sense of those principles is provided
by the industry and academic studies cited elsewhere in
this profile.
Much image searching is implicitly a subset of standard
searches of web pages, with the engine for example identifying
all image-bearing pages that match the user's search criteria.
Image-specific searching in major engines such as Google
trawls the particular engine's cache of web pages and
then presents thumbnail images from that cache in a rank
that usually reflects factors such as -
- the
ranking of the page with which the image is associated
- whether
users have clicked on the thumbnail during identical
past image searches
- any
weighting given to text that is likely to be directly
associated with the image, eg whether it has a relevant
caption, whether the image was 'embedded' close to multiple
instances of the search term in the page
Searches
of moving image collections, such as YouTube, typically
rank the files by the closeness of the match between the
user's search request and the terms supplied by the person
who uploaded each file onto the collection. Other rankings
(for example by date of upload, place of upload, most
viewed, most commented or most linked-to) are possible;
that ranking is mechanistic and does not involve the engine
interpreting the content of that video.
The same mechanisms are used to rank audio file collections,
with engines leveraging external cues rather than sorting
on the basis that a particular performance 'sounds like'
David Bowie rather than Iggy Popp or Enrico Caruso.
futures
A less mechanistic identification and ranking of non-text
files is often characterised as the "next frontier"
in search or even in artificial intelligence, with ambitious
claims being made for imminent breakthroughs in content
analysis or for application of neural networks and other
technologies that at the moment are just buzzwords in
search of a research grant.
What is often characterised as 'pattern recognition' has
attracted major commercial, military and academic attention.
That is unsurprising given -
-
awareness that interpretation of images is a key for
robotics
- arguments
that the algorithms and hardware used in image and audio
recognition will be associated with, if not drive, breakthroughs
in artificial intelligence
- perceptions
that problems associated with recognition are conceptually
demanding (and thus attract the best researchers) and
if solved are likely to be extremely lucrative (thus
attracting venture capital
or other investment)
-
awareness that matching and interpreting images is central
to some forms of biometrics
and satellite-based or other geospatial intelligence
systems
Some
researchers have placed their faith in a combination of
falling hardware costs and technologies such as SMIL,
with software for example automatically 'listening' to
the soundtrack of a video, converting that sound into
text (ie a script synchronised to the particular image)
and thereby being able to index a film without substantial
human intervention.
In practice current applications centre on 'brute force'
comparisons between a known audio or video recording and
one that is suspected to be a copy. That is attractive
for intellectual property rights owners concerned to place
the copyright or trademark genie back into the digital
bottle. Some enthusiasts have envisaged enforcement agents
able to automatically and comprehensively -
- trawl
the web in search of copies of musical performances,
movies, television programs or even trademarks
- determine
whether those copies or uses were authorised
- instruct
associates (eg social software services) and third parties
(eg ISPs, ICHs, search engines) to delete or merely
block unauthorised copies/uses
Apart
from fundamental technical challenges, that vision conflicts
with a range of commercial and legal realities.
Other enthusiasts have suggested that software can or
shortly will be able to readily interpret the content
of still/moving images without external cues and without
the preliminary selection (and analysis of standard data
in a highly circumscribed field) used by ANPR
systems. Competing promoters of content
filters have for example recurrently referred to artificial
intelligence in claiming that their products can very
accurately discriminate between adult content and youth-friendly
pictures on the basis of shape, colour (in particular
"skin tones") or luminosity. One vendor accordingly
claims to distinguish with 98% accuracy between backgrounds
and foreground, identify faces, differentiate between
a beach scene and pornography, and between a puppydog's
pink tummy, a peach and a naughty Miss America.
As Rea, Lacey, Lambe & Dayhot in 'Multimodal Periodicity
Analysis for Illicit Content Detection in Videos' (PDF)
and James Wang in Integrated Region-Based Image Retrieval
(New York: Kluwer 2006) note, that task is challenging,
particularly for video rather than still images and on
a real time basis. As a search mechanism such technologies
accordingly offer exclusion - accurate or otherwise -
rather than a useful way of finding all still images of
peaches (by Cézanne or otherwise) and any video
which features plums on the kitchen table.
next page (shopping)
|
|