Spanish MOR
Brian MacWhinney
macw at mac.com
Thu Jun 24 02:31:57 UTC 2004
Dear Info-CHILDES,
I have just recently completed training of a POST database for
disambiguation of the Spanish morphological tags.
The current version of the MOR tagger and the span.db POST database are
available from the CHILDES server. I have
run MOR and POST on the files in the Ornat, Marrero, and ColMex
directories and those are now fully tagged and disambiguated.
A brief glance over the results suggests that the disambiguator is not
making any mistakes. However, there are a variety of transcription
errors remaining in these files. For example, a common problem is
omission of the accent on éste and ésta which leads to them being
treated as demonstratives.
You can either view these files one by one through the browsable XML
facility or else download the whole directories as zip files.
If anyone is interested in going through these files or other
Spanish files to either clean up problems or note consistent gaps, that
would be
quite helpful.
We intend to gradually process the remaining Spanish corpora through
MOR over the next year or two.
--Brian MacWhinney
More information about the Info-childes
mailing list