Precodes and continuation markers
Kevin Donnelly
kevin at dotmon.com
Wed Dec 7 20:46:05 UTC 2011
Hi Erika
Thanks for this.
::::On Wednesday 07 December 2011 Erika Hoff said::::
> We use a 3-letter precode on the speaker tier. The codes we use are spa
> (Spanish), eng (English), mix (Mixed), and una (Unassignable). This allows
> us to easily generate counts of the number of utterances in each category.
Cool. We are tagging at the word level, so we can get word-counts by
language, but we're using precodes where all the words in an utterance belong
to one language. We don't use mix, but it's an interesting idea - although we
can generate this, precoding it might be a useful shorthand.
> We have no problem running MLT, CHECK, or GEM. We do not use MOR.
We're reading the chat files into a database and using that to do things like
sequence analysis based on POS, export to a gloss-aligned pdf, and export to
chat file. So we want to make sure that the chat export is a valid one, and it
turned out that an assumption we had made about the position of the precodes
was incorrect! MOR isn't available for Welsh, so we developed our own tagger,
which also handles English and Spanish.
--
Pob hwyl / Best wishes
Kevin Donnelly
kevindonnelly.org.uk
--
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com.
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.
More information about the Info-childes
mailing list