Spanish MOR

Brian MacWhinney macw at mac.com
Thu Jun 24 02:31:57 UTC 2004


Dear Info-CHILDES,
    I have just recently completed training of a POST database for 
disambiguation of the Spanish morphological tags.
The current version of the MOR tagger and the span.db POST database are 
available from the CHILDES server.  I have
run MOR and POST on the files in the Ornat, Marrero, and ColMex 
directories and those are now fully tagged and disambiguated.
A brief glance over the results suggests that the disambiguator is not 
making any mistakes.  However, there are a variety of transcription
errors remaining in these files.  For example, a common problem is 
omission of the accent on éste and ésta which leads to them being
treated as demonstratives.
   You can either view these files one by one through the browsable XML 
facility or else download the whole directories as zip files.
    If anyone is interested in going through these files or other 
Spanish files to either clean up problems or note consistent gaps, that 
would be
quite helpful.
   We intend to gradually process the remaining Spanish corpora through 
MOR over the next year or two.

--Brian MacWhinney


More information about the Info-childes mailing list