freq-ing for lemmas

Brian MacWhinney macwhinn at hku.hk
Sat May 5 02:42:47 UTC 2001


Dear Info-ChiBolts,
  David Barner from McGill just now asked me about how to get type-token
ratios (TTR) based on lemmas instead of whole words.  In response to his
question, I added this material to the CLAN manual in the FREQ section:

If you run FREQ on the data on the main speaker tier, you will get a
type-token ratio that is grounded on whole word forms, rather than lemmas.
For example, ³run,² ³runs,² and ³running² will all be treated as separate
types.  If you want to treat all forms of the lemma ³run² as a single type,
you should run the file through MOR and POST to get a disambiguated %mor
line.  Then you can run FREQ in a form such as this to get a lemma-based
TTR. 

freq -t* +t%mor +s"*\|*-%" +s"*\|*" sample.mor.pst

Depending on the shape of your morphological forms, you may need to add some
additional +s switches to this sample command.

****

As a further aside, some people might want to do this using the VOCD program
which is a more sophisticated way of getting TTR.  However, I think that
doing that would require a major reprogramming of VOCD.

--Brian



More information about the Chibolts mailing list