[Lexicog] audio component

Jan Ullrich jfu at LAKHOTA.ORG
Mon Apr 29 13:30:39 UTC 2013

Nick and Bill,


Thank you for your kind replies and useful hints.

We have been aware of the option with Elan (combined with Audacity), but the information on SndBite and Transcriber is new to us, so we will look into those options as well.


Thanks again





From: lexicographylist at yahoogroups.com [mailto:lexicographylist at yahoogroups.com] On Behalf Of Nick Thieberger
Sent: Saturday, April 27, 2013 3:46 PM
To: lexicographylist at yahoogroups.com
Subject: Re: [Lexicog] audio component



Dear Jan,


I have used the following method. Extract a script from your lexicon of the headwords you want to have spoken. Record a speaker saying them in order. Enter the script into time-aligning software like Transcriber or Elan and align each word to the relevant segment (actually quite quick if you use the visual image of the wave form to help you decide on the chunking of the audio file). There is software that will align a script and audio, but I think that is only available for large languages. 


Once you have finished, you need to export the information from Transcriber so that is in the form 'timecode, tab, text' . This can then be imported into Audacity as 'labels' for the imported audio file. Once you see all the words as labels in the Audacity file you can select 'Export multiple' and Audacity will proceed to chop up the file into small files (you select if you want them as .wav or .mp3), each named as per the label.


It is magic to watch Audacity plough through the file creating new mp3 files!


I hope this helps,




On 27 April 2013 10:48, Jan Ullrich <jfu at lakhota.org> wrote:


Dear Colleagues,


I would like to ask your advice regarding an audio component of dictionary entries. 


We are hoping to eventually record around 30,000 word entries of our Lakota dictionary. The dictionary is currently in the Toolbox database although we also have a MySQL online version. Also, we have an additional Toolbox field in each entry where we store a unique ID number.


We do have a plan for a semi-automated procedure, but I am wondering if there is a software utility or a recommended procedure for cutting the long audio file(s) with a chain of words into individual files for each word and naming them according to the respective entry word or preferable entry ID.

In the second phase of the project we woul d also like to create the audio component for the 40,000 example sentences and collocations. These currently do not have ID numbers so I think we will have to add those. 


Best regards




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20130429/ca7efecc/attachment.html>

More information about the Lexicography mailing list