A Musical Web Observatory:

Connecting People with Data, Documents, and Algorithms

An experiment with Firefox's Advanced Audio Processing Extension ...

... featuring a mash-up of Greenstone, AudioDB, and Meandre.

This project is currently work in progress. The sequence of development so far has been:

Starting point: Manual (command-line) aggregation of disparate resources prior to building the DL. A set of bespoke Greenstone document processing plugins corrals the heterogeneous gathered data into a unified (homogeneous) canonical form that the DL can access and display. Everything presented in the DL is either pre-computed (such as the self-similarity heat-maps) or else computed at build-time.
Audio-fingerprinting: as before, but now metadata about the audio songs is now enriched through a set of audio-fingerprinting web services. Everything presented in the DL is still pre-computed or else computed at build-time—however, the inclusion in the document view of a "Discovery" block allows a user to begin to access and explore through linked-data, information related to the song.
Client-side audio processing (and visualization): the Pre-computed self-similarity heat-maps are dropped from the collection building process in preference for the same information being computed through Javascript running in the user's web browser.
Embedded Meandre workflows: the Meandre Workbench is integrated into Greenstone. Audio documents in the Greenstone digital library can now be dispatched and be processed by the selected Meandre, and output from the workflow returned to the Greenstone document view—for example, playing audio that has been processed and output from the workflow.
Client-side workflows: the client-side Javascript processing code is refactored to follow the same methodology as Meandre components.
Client/Server hybrid workflows: the Greenstone/Meandre integration is extended to support the dynamic transmission of executeCallBack() methods written in the user's browser in Javascript to be run on the Meandre server as part of the active workflow.
Forging ahead: the next area to be worked on is upgrading the level of Greenstone/Meandre integration so that data produced by the workflows can be incorporated back into the underlying digital library itself, rather than being (as it currently stands) transitory data that only lives for the duration of the web page being viewed. This will form an implementation stepping-stone to a more generalized ability to have data retrieved from other external resources (located through the Discovery block linked-data portion of the document view) ingested in to the DL collection.

A Walkthrough

The following walkthrough is for the initial incarnation of the DL collection, where all the information presented is either precomputed or computed at build time.

Taking as a starting point a set of music files identified as worthy of study, the figure below shows the result of browsing the formed digital library collection by title from a web browser. The figure is a useful snapshot in which to orientate ourselves with the main structure and features to the digital library. Functionality that persistently reoccurs is accessible through the header to the page.

This includes:

help and preferences (top-right);
a quick-search option (located just below) with links to more sophisticated searching options; and
pin-pointing where within the site a user is currently located (top-left).

Browsing in the digital library by titles.

The specific content to this location within the site (in this case browsing by title) is shown beneath the main banner. Various groupings of title can be accessed by clicking on the bookshelf icons vertically aligned as the main part of the page: currently C–D is open, with the remaining letters to the alphabet below this, accessed through scrolling.

Interested in the song Candela our curious musicologist clicks on the link for this. This brings up the document view to this song:

The musicologically enriched document view for Candela.

Normally in a digital library the document view brings up a page that is strongly derived from textual metadata. If the document viewed was a text-document, some summary information such as title and author is typically presented, say in tabular form, before the main text is presented. Even in the case of multimedia digital libraries, the view presented is still strongly derived from textual metadata: this time including details such as the length of the video, the TV company that produced it, whether captions are available, and so forth, accompanied with an embedded video player for viewing the content—essentially more textual metadata (in this case the URL to the video content) which in terms of the user-interface is largely divorced from the other elements displayed on the page.

This contrasts sharply with the document view developed in this digital library. Naturally it allows the song to be played (akin to the embedded video player), however this is largely of secondary importance to the other functionality available this is much more closely integrated.

The most striking visual component to the document view is a self-similarity "heat map" where the duration of the song forms both the x- and y-axis, and a red pixel located at a given (x,y) co-ordinate in the map represents a location where two parts of the song are strongly similar, proportionally shifting to blue to represent dissimilar. Given this configuration, the leading diagonal to the matrix (x=y) is always coloured red as this represents the comparison of one part of the song with itself.

When the user moves the mouse cursor around the self-similarity map a highlighting circle is shown to emphasize the area the user is over, with a black dot at the centre (visible in in the above figure); annotated vertically and horizontally are the two time-offsets in seconds that that point in the map corresponds to. Clicking the cursor at this point results in the audio being played simultaneously from these two points. To aid the musicologist in listening to the two parts of the song, one part is panned to the left speaker, and the other to the right (this was implemented using the extended audio API provided by Firefox, and so this particular feature only work when viewing the collection with this browser—see implementation details below). In our figure, the musicologist has zeroed in the location x=33, y=97 which corresponds to the start of a strong red diagonal that occurs some distance off the leading diagonal. Listening to the two sounds played (most reliably done with headphones on), they hear that these two sections of the song are indeed repeating sections of the guitar piece Candela with a minor variation in the latter section where a recorder is also playing in the arrangement.

The structured audio time-lines (labelled A, B, ..., and 6, 5, 2, ... in the figure) located above the self-similarity map are another area of enriched musical content in the digital library. The upper line shows the ground-truth data for this song generated by the Salami project; the lower line is generated by an algorithmically based content analysis algorithm.

While there is some agreement between these two lines, there are also significant differences. The play and search buttons within the structured time-lines (the latter represented by a magnifying glass) allow the user to investigate further these structures. We shall return to the search functionality shortly (which is content based, using AudioDB), but in the meantime note that with the time-lines positioned above the self-similarity map, there is further opportunity to study the differences between the two structured time-line. It is certainly the case that there are strong visual cues in the map that line up with the algorithmic time-line, even though they do not align with a boundary in the ground-truth data, and the user can click on these parts of the similarity map to hear what is occurring at these points.


Audio content based search results: left) the result list from an audio content based query taken from an extract of Michelle; and right) the result of a structured music search for content containing "b b c b a" as a sequence.

Returning to the search capability provided by the structured time-lines, the above figure (left) show the result of using this feature while study the song Michelle by The Beatles. In this case the user selected the section of the ground-truth time-line corresponding to section starting "I want you ...", but could have equally used the algorithmically calculated time-line, or in fact paused the song playing at any point, and started a match from there.

Not surprisingly Michelle is returned as the top hit (at 92.1%)—we shall see shortly that this is because the system found several sections of the song that matched this—the next hit being Bigger Than JC at 74.4%, and so on down, where only one hit per song occurs. Clicking on the top hit, the figure below shows the document view that is displayed, focusing in on the key area to this screen. This time the time-line area has an additional bar: the points within the song that AudioDB found to be similar. Like the other time-lines, a play button is present on these segments so the user can play the matching points directly. In this case clicking on them reveals the matching sections found correspond to melodically repeating sections of the song, only with different lyrics ("I love you ..." and "I need you ...").

A further form of musical content-based searching is available through the main header to the digital library. Instead of searching by title or artist (which are also available) the user can click on this "search by" menu next to the quick-search text box and select "ground-truth structure" instead. The figure above (right) shows the result of using this option, where the user has entered "b b c b c" as the query, in other words searching for songs that have two sections in a row the same, then a new section, then returning to the original section, before progressing to a recurrence of the second distinct section. In pop and rock music this is a popular sequence corresponding to verse, verse, chorus, verse, chorus. A more intriguing framing of a query along these lines—equally possible in the digital library—would be to go to the fielded search page (accessible through the advanced search link below the quick-search box, and entered the same music structure query, but this time combine it with other fielded query terms (Genre=Blues OR Jazz, Year<1950). Results from this query would let the musicologist explore potential evidence of this pattern of sections being used in two of the key musical genres that influenced the development of Rock and Roll music.

The document view for Michelle augmented with the locations of the matches of the audio query.

Implementation details

The core part of the interactive elements in the document view were implemented using SVG combined with Javascript. The left- and right-panning interactively available from the self-similarity map was implemented by processing the raw audio stream, made accessible by the Firefox Audio extension API.

AudioDB content based searching was integrated into Greenstone through two components of the digital library software architecture: its build-time document processing plugin system, and its runtime message-passing service-base framework. The developed plugin accepts a wide range of audio formats (including OGG and MP3), and converts them to WAV, the format needed by AudioDB for processing. The new search service took the form of a proxy, accepting messages in the XML syntax used by Greenstone, turning them into the necessary calls to the AudioDB command-line interface, and then converting the output from AudioDB back into the XML syntax expected by the digital library architecture. Finally, the two parts were packaged to operate as a Greenstone extension; the software is available at: http://trac.greenstone.org/gs3-extensions/audioDB/trunk/src.