vocd-ing for lemmas

Brian Richards b.j.richards at reading.ac.uk
Tue May 8 16:26:55 UTC 2001

Dear Brian,

If the main speaker tier has been morphemicized vocd will do an analysis
based on lemmas if the switch +s*-%% is included. This will mean that
"go", "go-es", and "go-ing" are treated as one type. If people should
also want to treat forms with fusion markers as the same word type, eg
"sing", "sing&ed" then add the switch +s*&%%.

The same switches seem to work ok on the %mor tier, although vocd was
not designed with this in mind. Items on the %mor tier that are marked
as errors with * can be filtered out using -s**


Brian MacWhinney wrote:
> Dear Info-ChiBolts,
>   David Barner from McGill just now asked me about how to get type-token
> ratios (TTR) based on lemmas instead of whole words.  In response to his
> question, I added this material to the CLAN manual in the FREQ section:
> If you run FREQ on the data on the main speaker tier, you will get a
> type-token ratio that is grounded on whole word forms, rather than lemmas.
> For example, ³run,² ³runs,² and ³running² will all be treated as separate
> types.  If you want to treat all forms of the lemma ³run² as a single type,
> you should run the file through MOR and POST to get a disambiguated %mor
> line.  Then you can run FREQ in a form such as this to get a lemma-based
> TTR.
> freq -t* +t%mor +s"*\|*-%" +s"*\|*" sample.mor.pst
> Depending on the shape of your morphological forms, you may need to add some
> additional +s switches to this sample command.
> ****
> As a further aside, some people might want to do this using the VOCD program
> which is a more sophisticated way of getting TTR.  However, I think that
> doing that would require a major reprogramming of VOCD.
> --Brian

Professor Brian Richards
School of Education
The University of Reading
Bulmershe Court
Reading, RG6 1HY, UK

