Precodes and continuation markers

Kevin Donnelly kevin at dotmon.com
Wed Dec 7 20:46:05 UTC 2011


Hi Erika

Thanks for this.

::::On Wednesday 07 December 2011 Erika Hoff said::::
> We use a 3-letter precode on the speaker tier. The codes we use are spa
> (Spanish), eng (English), mix (Mixed), and una (Unassignable). This allows
> us to easily generate counts of the number of utterances in each category.

Cool.  We are tagging at the word level, so we can get word-counts by 
language, but we're using precodes where all the words in an utterance belong 
to one language.  We don't use mix, but it's an interesting idea - although we 
can generate this, precoding it might be a useful shorthand.

> We have no problem running MLT, CHECK, or GEM. We do not use MOR.

We're reading the chat files into a database and using that to do things like 
sequence analysis based on POS, export to a gloss-aligned pdf, and export to 
chat file.  So we want to make sure that the chat export is a valid one, and it 
turned out that an assumption we had made about the position of the precodes 
was incorrect!  MOR isn't available for Welsh, so we developed our own tagger, 
which also handles English and Spanish.

-- 
Pob hwyl / Best wishes

Kevin Donnelly
kevindonnelly.org.uk

-- 
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com.
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.



More information about the Info-childes mailing list