[RNLD] Links between publication and sound corpus

A. D. Nakhimovsky adnakhimovsky at COLGATE.EDU
Fri Mar 8 19:09:27 UTC 2013

Two comments on John's problem description. First, we can probably make the
simplifying assumption that the media files have been time-aligned with
some text or texts (annotation tiers), and the user has access only to the
predefined start-end times of time-aligned segments. (These can be
sentences or words in a word-list-with-pauses recording.) If the segments
are numbered then the query string can provide just the number of the

Secondly, this item seems unnecessarily restrictive: "3) We want to point
to an archived version, not some special version hosted for the purpose of
these embedded links." What if the archive does not see "providing a
service like this to be part of its long-term role" and does not even
expose its holdings as URLs? The general solution might be to think of the
snippetServer as specifically a Web server that holds: compressed versions
of the media (WAV files are huge), Web pages with time-alignment
information, and a browser-based program to play individual segments or
sequences of segments. A Web server like this could also be useful for
hosting community materials. Some of the programming (not all) has been
done: the link below will play the third segment of a Flash clip that shows
a selection from an MIT lecture, used as English language material:



On Thu, Mar 7, 2013 at 9:25 PM, Doug Cooper
<doug.cooper.thailand at gmail.com>wrote:

> Yes, this states the server solution exactly.  This does not pose any
> technical barrier (it's just a matter of providing a wrapper for
> something like sox or mp3splt).  It just needs an archive that sees
> providing a service like this to be part of its long-term role.
>   Although html5 does allow a start/finish time to be specified, the last
> time I checked the complete underlying file is downloaded.  I'd be happy
> to be wrong about this, or to hear that there's an (Apache) server add-on
> that handles this in a more intelligent way.
>   Doug
> On 3/8/2013 2:20 AM, John Hatton wrote:
>>  > That was some time ago and now I think I would use an archival version
>> of
>> the media as the streaming source and have HTML5 calls to the timecodes.
>> Am I understanding the problem correctly?
>> 1) We want URLs which act just like a pointer to a static wav somewhere
>> on the
>> internet. These can then be embedded in anything.
>> 2) But because we don't want to actually carve up each file into little
>> files,
>> we need the URL to specify a time range rather than just a filename.
>> 3) We want to point to an archived version, not some special version
>> hosted for the purpose of these embedded links.
>> If I understand the problem, then the solution is a URL like
>> http://<some snippet service>.org/<address of the archived
>> version>?start=<starttime>&**end=<endtime>
>> (That last bit after the '?' is called a URL Query string.)
>> E.g.
>> http://snippetServer. org/?url=paradisec.org.au/**someinternalpathat<http://paradisec.org.au/someinternalpathat>
>> paradisec/KovaiCanoeStory.wav?**start=02:20:10&end=02:22:10
>> <http://snippetServer.%20org/?**url=paradisec.org.au/**
>> someinternalpathat%**20paradisec/KovaiCanoeStory.**
>> wav?start=02:20:10&end=02:22:**10<http://paradisec.org.au/someinternalpathat%20paradisec/KovaiCanoeStory.wav?start=02:20:10&end=02:22:10>
>> >
>> When it receives this query, the server would get ahold of the full audio
>> file
>> declared in the query string, and then stream out just the section that
>> was
>> called for. The experience to the user would be the same as if they had
>> clicked on a url of a pre-prepared, stand-alone file containing just that
>> snippet.
>> Now, because the audio itself is served by an archive, it will have a long
>> lifetime. The snippet server itself need not be related to the archive; a
>> single instance could serve everyone. But if the snippet server itself
>> goes
>> away in the future, the URL is still human readable, and can be changed
>> via
>> search/replace to some new snippet server.
>> To avoid the links going bad, it seems the snippet server should be run by
>> something prepared to be around for a long time, like an MPI or an
>> archive itself:
>> http://paradisec.org.au/**snippetserver/.<http://paradisec.org.au/snippetserver/.>
>> ..
>> Practically, such a server could limit its services to files in its own
>> repository or some set of other domains, if it didn’t want to end up
>> providing
>> this snippet service for just any content on the web.
>> I googled a bit, didn't come up with anything, but I wouldn't be
>> surprised if
>> such a service already existed. If not, well clearly this would be cheap
>> to do.
>> John Hatton
>> SIL International Language Software Develoment

Alexander Nakhimovsky, Computer Science Department
Colgate University Hamilton NY 13346
Director, Linguistics Program
Director, Project Afghanistan
t. +1 315 228 7586 f. +1 315 228 7009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20130308/6bc894ac/attachment.html>

More information about the Resource-network-linguistic-diversity mailing list