hyperlinking timecoded audio data in dictionaries

Pascale Jacq pascale.jacq at anu.edu.au
Fri Aug 13 03:35:54 UTC 2004


The use of Data streaming and time-coded audio data for hyperlinking audio
in interactive dictionary making

Hello RNLD list subscribers. My name is Pascale Jacq, I'm working on a Hans
Rausing Endangered Language Funded project to document the moribund Jawoyn
language of Arnhem land, with the Chief Investigator Prof. Francesca
Merlan, at the Australian National University.
I just joined RNLD yesterday after asking Nick Thieberger for advice on how
to create hyperlinks to our audio data within the Jawoyn interactive
dictionary we're creating in Shoebox (MDF) for use by the language
community in language teaching/maintenance etc. He suggested to send my
query (which I've reworded somewhat) to the RNLD list so others can benefit
from our experience.

Background:
The analogue and DAT tape audio data we have is now digitised (by AIATSIS)
onto more than 100 CD's (each CD has a single unsegmented WAV file of an
hour or 90 minutes length, ie. in "real time"). We hold Master copies of
each, as do AIATSIS in their archive, plus the original tapes are archived
there. Any future copies made from the Master CD I believe is called a 'red
book' copy and is in read-only format.

Problem:
Now the problem is, when I wish to make a hyperlink to a sentence
exemplifying a dictionary entry (easily done when the Shoebox lexicon is
exported to WORD), I can only link to the single WAV file, not the relevant
time-coded segment where the sentence occurs.

Solution:
The solution Nick suggested is the following (extracted from his email
reply to me dated 12/08/2004):
>"It sounds like you could use a streaming server for which you would have
>all of your CDs loaded onto a hard disk and be able to access timecoded
>segments anywhere within that data. You could also convert the wav files
>to MP3 for this purpose and it would take up a tenth of the disk space.
>The LDC/Talbank use a streaming server to deliver their data. There is
>also the work being done on Annodex by CSIRO (also known as CMWeb)
>http://www.cmis.csiro.au/maaate/, and http://www.annodex.net/. They may be
>able to provide a solution and I would be interested to hear about
>anything you come up with with them".

[I'm currently investigating the streaming server solution]

Final Questions:
A further concern which emerged when I thought about downsampling to MP3
was: Would the time coding change from the original WAV format? The aim of
archiving linguistic data is to make it consistent, durable, catalogued and
thus easily accessible (always back to the original source) in the future
by those to whom the speakers allow data access.
I've already had the experience of a DAT tape 'drop out' of 23 seconds in
the digitisation process. Luckily the Master copy kept at AIATSIS had these
23 seconds of material and they could make a 'red book' copy for our use.
However, I noticed that the time coding of the first Master CD we had was
now one second out from the 'red book' copy (in addition to the 23 seconds)
and thus I wonder if any copy made from the original Master would not share
the same time coding?

This is a serious issue to consider if we are to use hyperlinks to audio
recordings, and I'd appreciate any advice, comments or similar experiences
you may have.



More information about the Resource-network-linguistic-diversity mailing list