Precodes and continuation markers
    Kevin Donnelly 
    kevin at dotmon.com
       
    Wed Dec  7 23:11:00 UTC 2011
    
    
  
Hi Brian
::::On Wednesday 07 December 2011 Brian MacWhinney said::::
> If one defines mixed as meaning that one or more words in an utterance come
> from the other language, then just looking for utterances with @s would
> tell you whether something is mixed.  Isn't that true?
If you have a predominantly English text, where English is a null marker, and 
then you get a line of Spanish, tagged @s:spa, in which one or more words are 
indeterminate (ie occur in both English and Spanish dictionaries - this may 
match Erika's unassigned), tagged @s:eng&spa, you would have only @s tags, and 
this would count as mixed.  But if you had another line of Spanish with no 
indeterminates, you would also have only @s tags, and this would count as 
Spanish.  So counting only @s may be deficient.
> I'm not sure how one defines unassignable, but I certainly agree that, on
> the lexical level, many forms are ambiguously LX and LY, particularly in
> the Welsh-English pairing.
> > We're reading the chat files into a database and using that to do things
> > like sequence analysis based on POS, export to a gloss-aligned pdf, and
> > export to chat file.  So we want to make sure that the chat export is a
> > valid one, and it turned out that an assumption we had made about the
> > position of the precodes was incorrect!
> 
> Right, but you can run CHECK and CHATTER before reading to the database,
> right?
Once the initial import is done, we generate edited/amended files from the 
database, so yes, in an ideal world each iteration of each file would be 
checked, but because of time constraints we usually only do a periodic check.  
And sometimes we generate files with features (eg empty %eng tiers to pinpoint 
where a translation is required - see bangortalk.org.uk/progress) which (I'm 
told?) wouldn't pass CHECK anyway.
You're probably right, though, that we should try adding some hoop that runs a 
periodic CHECK on all the files.  I'll look at doing that.
-- 
Pob hwyl / Best wishes
Kevin Donnelly
kevindonnelly.org.uk
-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
    
    
More information about the Chibolts
mailing list