freq-ing for lemmas
Brian MacWhinney
macwhinn at hku.hk
Sat May 5 02:42:47 UTC 2001
Dear Info-ChiBolts,
David Barner from McGill just now asked me about how to get type-token
ratios (TTR) based on lemmas instead of whole words. In response to his
question, I added this material to the CLAN manual in the FREQ section:
If you run FREQ on the data on the main speaker tier, you will get a
type-token ratio that is grounded on whole word forms, rather than lemmas.
For example, ³run,² ³runs,² and ³running² will all be treated as separate
types. If you want to treat all forms of the lemma ³run² as a single type,
you should run the file through MOR and POST to get a disambiguated %mor
line. Then you can run FREQ in a form such as this to get a lemma-based
TTR.
freq -t* +t%mor +s"*\|*-%" +s"*\|*" sample.mor.pst
Depending on the shape of your morphological forms, you may need to add some
additional +s switches to this sample command.
****
As a further aside, some people might want to do this using the VOCD program
which is a more sophisticated way of getting TTR. However, I think that
doing that would require a major reprogramming of VOCD.
--Brian
More information about the Chibolts
mailing list