Question on Elan, Toolbox & Clan interoperability

Thu Jun 25 23:56:43 UTC 2009

Dear RNLDers

I am in the field at the moment working with speakers on a  
conversational text.
Because Elan has infinite numbers of tiers and seems to be able to  
import and export to a range of different applications I thought it  
might make a good place to house all of the linguistic data relating  
to a particular transcript.
Please advise whether the following workflow is doable or reasonable.  
It's been a while since I set toolbox up for the language I'm working  
on and I feel a bit rusty.

I'm using Elan for the first time. I have transcribed a conversation  
(in Elan) and am now ready to interlinearize it.

I want to parse it in Toolbox and and then reimport the parsed text  
into ELAN, with newly acquire morphological and part of speech tiers.

Previously I have had my (toolbox) transcription files as follows

\ref (time alignment)
\per (speaker)
\trs (transcription)
\m (morphological tier)
\g (gloss)
\p (part of speech)
\t (free translation)

I'm not committed to this arrangement if others will allow better  
integration with Elan.

When I export from Elan I wind up with a toolbox file that looks like  
this

\block
\LT  (speaker1)
\EC (speaker2)
\GN (speaker3)   [vernacular text]
\PB (speaker4)
\ELANBegin
\ELANEnd

In this case the text is attributed to the relevant speaker and the  
other non-speakers remain blank.

In this case would I be best to do something to rejig the file that is  
outputted by Elan so that it is more like my prior toolbox files or  
should I get  toolbox to parse the \LT, \EC, \GN and \PB lines. The  
latter might make for an easier import back into ELAN.

Later I will want to do more detailed transcriptions, probably with  
Clan. However I'll cross that bridge when I come to it.
Any assistance would be greatly appreciated.
Joe

Joe Blythe
Postdoctoral Fellow
Austkin Project
School of Language Studies
Australian National University
+61 409 881 153
joe.blythe at anu.edu.au