[RNLD] Links between publication and sound corpus

Sat Mar 9 06:28:17 UTC 2013

Hi everyone,

Just to update my earlier post. Apparently Mouton de Gruyter has now
agreed to host audio files that are for examples in a forthcoming
grammar of an endangered language. And they have removed the
requirement that they be given copyright over the audio file: they
will remain the property of the community of speakers. A great
development!

Cheers,

Ruth

On Sat, Mar 9, 2013 at 5:09 AM, A. D. Nakhimovsky
<adnakhimovsky at colgate.edu> wrote:
> Two comments on John's problem description. First, we can probably make the
> simplifying assumption that the media files have been time-aligned with some
> text or texts (annotation tiers), and the user has access only to the
> predefined start-end times of time-aligned segments. (These can be sentences
> or words in a word-list-with-pauses recording.) If the segments are numbered
> then the query string can provide just the number of the segment.
>
> Secondly, this item seems unnecessarily restrictive: "3) We want to point to
> an archived version, not some special version hosted for the purpose of
> these embedded links." What if the archive does not see "providing a service
> like this to be part of its long-term role" and does not even expose its
> holdings as URLs? The general solution might be to think of the
> snippetServer as specifically a Web server that holds: compressed versions
> of the media (WAV files are huge), Web pages with time-alignment
> information, and a browser-based program to play individual segments or
> sequences of segments. A Web server like this could also be useful for
> hosting community materials. Some of the programming (not all) has been
> done: the link below will play the third segment of a Flash clip that shows
> a selection from an MIT lecture, used as English language material:
>
> http://n-topus.com/KUEnglish/algo01/algo0101.xhtml?movie_range=3&playOnSelection=true
>
> adn
>
>
>
> On Thu, Mar 7, 2013 at 9:25 PM, Doug Cooper <doug.cooper.thailand at gmail.com>
> wrote:
>>
>> Yes, this states the server solution exactly.  This does not pose any
>> technical barrier (it's just a matter of providing a wrapper for
>> something like sox or mp3splt).  It just needs an archive that sees
>> providing a service like this to be part of its long-term role.
>>
>>   Although html5 does allow a start/finish time to be specified, the last
>> time I checked the complete underlying file is downloaded.  I'd be happy
>> to be wrong about this, or to hear that there's an (Apache) server add-on
>> that handles this in a more intelligent way.
>>
>>   Doug
>>
>>
>>
>> On 3/8/2013 2:20 AM, John Hatton wrote:
>>>
>>>  > That was some time ago and now I think I would use an archival version
>>> of
>>> the media as the streaming source and have HTML5 calls to the timecodes.
>>>
>>> Am I understanding the problem correctly?
>>>
>>> 1) We want URLs which act just like a pointer to a static wav somewhere
>>> on the
>>> internet. These can then be embedded in anything.
>>>
>>> 2) But because we don't want to actually carve up each file into little
>>> files,
>>> we need the URL to specify a time range rather than just a filename.
>>>
>>> 3) We want to point to an archived version, not some special version
>>> hosted for the purpose of these embedded links.
>>>
>>>
>>> If I understand the problem, then the solution is a URL like
>>>
>>> http://<some snippet service>.org/<address of the archived
>>> version>?start=<starttime>&end=<endtime>
>>>
>>> (That last bit after the '?' is called a URL Query string.)
>>>
>>> E.g.
>>>
>>> http://snippetServer. org/?url=paradisec.org.au/someinternalpathat
>>> paradisec/KovaiCanoeStory.wav?start=02:20:10&end=02:22:10
>>>
>>> <http://snippetServer.%20org/?url=paradisec.org.au/someinternalpathat%20paradisec/KovaiCanoeStory.wav?start=02:20:10&end=02:22:10>
>>>
>>>
>>> When it receives this query, the server would get ahold of the full audio
>>> file
>>> declared in the query string, and then stream out just the section that
>>> was
>>> called for. The experience to the user would be the same as if they had
>>> clicked on a url of a pre-prepared, stand-alone file containing just that
>>> snippet.
>>>
>>> Now, because the audio itself is served by an archive, it will have a
>>> long
>>> lifetime. The snippet server itself need not be related to the archive; a
>>> single instance could serve everyone. But if the snippet server itself
>>> goes
>>> away in the future, the URL is still human readable, and can be changed
>>> via
>>> search/replace to some new snippet server.
>>>
>>> To avoid the links going bad, it seems the snippet server should be run
>>> by
>>> something prepared to be around for a long time, like an MPI or an
>>>
>>> archive itself:
>>>
>>> http://paradisec.org.au/snippetserver/...
>>>
>>> Practically, such a server could limit its services to files in its own
>>> repository or some set of other domains, if it didn’t want to end up
>>> providing
>>> this snippet service for just any content on the web.
>>>
>>> I googled a bit, didn't come up with anything, but I wouldn't be
>>> surprised if
>>> such a service already existed. If not, well clearly this would be cheap
>>> to do.
>>>
>>> John Hatton
>>>
>>> SIL International Language Software Develoment
>>>
>
>
>
> --
> Alexander Nakhimovsky, Computer Science Department
> Colgate University Hamilton NY 13346
> Director, Linguistics Program
> Director, Project Afghanistan
> t. +1 315 228 7586 f. +1 315 228 7009