[RNLD] Links between publication and sound corpus

Thu Mar 7 14:12:52 UTC 2013

My experience is that if you know the sound intervals in advance, it's
easiest in the long run to use sox or mp3splt [sic] to chop up the files
from the get-go.  You can:

  - create them all at once from a shell script (batch file),
  - use the start-finish (or start-duration) details as file names,
  - serve them from anywhere on the Web as standalone audio files,
  - link to them (as audio files) from anywhere (e.g. PDFs or Web pages).

   As far as I know the alternative is to write a cgi script that does the
exact same thing, but is more painful in regard to linking and serving.
It also creates a long-term dependency issue for secondary applications
like dictionaries, which would have to rely on your server forever, rather
than copying the snippets they need.  (I'm not aware of any archives that
provide an API for random access to archived audio.)

   None of this prevents you from metatagging the original, complete file
with all the gory time interval details, of course.

   Good luck,
   Doug

On 3/7/2013 7:52 PM, Steffen Haurholm-Larsen wrote:
> I am a Danish PhD student writing my dissertation at the University of Bern,
> Switzerland, in the form of a grammar of Garifuna, an Arawakan language spoken
> in most Central American countries. I am posting here because I have so far
> been unable to accomplish the linkage of specific parts of recordings to texts
> of language description such as a dissertation. I am thinking that someone in
> the linguistic community might have done this or have some suggestions.
>
> I intend to follow the best practices in language documentation with all that
> this entails in terms of data portability, metadata, archiving etc. and I
> would also like to incorporate the underlying data in the writing of my
> dissertation, and I figure it will much easier to start linking examples to
> media files from the beginning rather than having to go through the whole
> dissertation at the end and put those links in.
>
> However, to date I know of no program or tool that will allow me to link
> directly between a document such as a dissertation, and directly play a
> specific time interval in a media file, that is, I would like to avoid cutting
> my audio files up into little pieces but rather just link to the specific time
> code where the relevant example is located. I am transcribing in ELAN which
> does allow the user to search and go directly to a specific annotation and it
> is possible to do quite specific searches, but I have yet to figure out how
> one might access a specific annotation / sound interval directly by a link
> inside the dissertation text itself.
>
> Does anybody know of such a program or perhaps who might have done something
> similar?
>
> The reason I would like to do this is the accessibility that this would add to
> the data - ultimately it should be possible to link both ways, from the
> descriptive work to the corpus and back, and preferably also with a link to a
> dictionary.
>
> Best wishes,
>
> Steffen Haurholm-Larsen
> Universit�t Bern