old style morphemes

Brian MacWhinney macw at cmu.edu
Tue Feb 13 02:30:08 UTC 2007


Dear Info-Chibolts,
      I think it would be helpful to review the current situation in  
CLAN in regard to the use of the old style main line  
morphemicization.  This is the form that involves transcribing jumped  
as jump-ed on the main line.  During the process of creating a  
complete %mor line for the database, I soon found that use of this  
form was extremely inconsistent.  For example, for the form "jumped",  
we would get "jump-ed",  "jump-d", and "jumped."  For irregulars, the  
problem was worse with bend-ed, bent-ed, bend&ed, bent&ed, bent, bend- 
d, and so on.  To solve this, we replaced all forms with the  
conventional transcription (jumped) and relied on MOR to do the  
morphemic analysis. In the case of errors, the form is:  wented [:  
wnet] [* +ed-sup].

Between 1999 and 2002, I worked many months to implement these  
changes and to improve MOR and POST.  None of the files in English  
rely on this anymore and neither does MOR.  We have eliminated this  
from other languages too with a few exceptions for Hebrew and some  
other languages that do not yet have a MOR grammar.  We also changed  
MLU to run by default from the %mor line (although this can be  
changed by using -t%mor)

Some people still rely on class lessons that ask students to compute  
MLU from the main line.  For these situations, let me suggest three  
options.
1.  Instead of having students running MLU on their own data, have  
them use data from the database where there is a full %mor for  
accurate MLU counting.
2.  Teach students to run these two commands on their files, so they  
can run MOR on the %mor line
mor *.cha +1
post +tposttags.cut *.cha +1
3.  Stick with the old system.  In this case, if you want to run  
CHECK on the file, add the term "Legacy" to the @Languages file, as in
@Languages:	en, Legacy

Best wishes,

--Brian MacWhinney



More information about the Chibolts mailing list