old style morphemes
Brian MacWhinney
macw at cmu.edu
Tue Feb 13 02:30:08 UTC 2007
Dear Info-Chibolts,
I think it would be helpful to review the current situation in
CLAN in regard to the use of the old style main line
morphemicization. This is the form that involves transcribing jumped
as jump-ed on the main line. During the process of creating a
complete %mor line for the database, I soon found that use of this
form was extremely inconsistent. For example, for the form "jumped",
we would get "jump-ed", "jump-d", and "jumped." For irregulars, the
problem was worse with bend-ed, bent-ed, bend&ed, bent&ed, bent, bend-
d, and so on. To solve this, we replaced all forms with the
conventional transcription (jumped) and relied on MOR to do the
morphemic analysis. In the case of errors, the form is: wented [:
wnet] [* +ed-sup].
Between 1999 and 2002, I worked many months to implement these
changes and to improve MOR and POST. None of the files in English
rely on this anymore and neither does MOR. We have eliminated this
from other languages too with a few exceptions for Hebrew and some
other languages that do not yet have a MOR grammar. We also changed
MLU to run by default from the %mor line (although this can be
changed by using -t%mor)
Some people still rely on class lessons that ask students to compute
MLU from the main line. For these situations, let me suggest three
options.
1. Instead of having students running MLU on their own data, have
them use data from the database where there is a full %mor for
accurate MLU counting.
2. Teach students to run these two commands on their files, so they
can run MOR on the %mor line
mor *.cha +1
post +tposttags.cut *.cha +1
3. Stick with the old system. In this case, if you want to run
CHECK on the file, add the term "Legacy" to the @Languages file, as in
@Languages: en, Legacy
Best wishes,
--Brian MacWhinney
More information about the Chibolts
mailing list