[Corpora-List] Significance test for TTR

Georgios Mikros gmikros at isll.uoa.gr
Sun Nov 20 18:00:00 UTC 2011


Dear Chris,

First things first. TTR is highly dependent to text length  so you have to
be sure that the measurements have been taken from equal size text samples.
Otherwise you should use a more robust index such as Yule's K or Zipf's Z
(see the [1] for a detailed description of this problem). Now coming to your
original question, TTR is a continuous variable and you could use the whole
range of parametric statistics. This means that you can use a t-test if you
want to check whether TTR is significant different across two classes (e.g.
Gender distinction in essays), or ANOVA if your independent variable has
many classes (e.g. Text Genre, Text Topic etc). You can also implement a
linear regression model with dependent variable TTR and independent
variables the ones that describe your research hypothesis. In all the above
cases you need multiple TTR measurements because inferential statistics are
based on the distribution parameters of the TTR. There is also the option to
compare a single TTR value to a distribution of TTR values using one-sample
location test (also called Z test) which actually can tell you how the
specific TTR value lies away from the mean of the TTRs.

If the only thing you know are just 2 TTR values I don't think you can
compare them in any meaningful way.

Best

George Mikros

 

[1] Tweedie, Fiona J., & Baayen, Harald R. (1998). How variable may a
constant be? Measures of lexical richness in perspective. Computers and the
Humanities, 32(5), 323-352.

 

____________________________

George K. Mikros

Associate Professor of Computational and Quantitative Linguistics

Department of Italian Language and Literature

School of Philosophy 

National and Kapodistrian University of Athens 

Panepistimioupoli Zografou, GR-15784 

Athens, Greece

Tel: +30 210 7277491, +30 6976111742

Email: gmikros at isll.uoa.gr

Web: http://users.uoa.gr/~gmikros/

        

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
CRuehlemann at aol.com
Sent: Sunday, November 20, 2011 7:21 PM
To: CORPORA at uib.no
Subject: [Corpora-List] Significance test for TTR

 

Hi all, 

 

The type token ratio (TTR) is a measure of the lexical diversity of a
text/text type. If one finds in two texts/text types two widely differing
TTRs, one would like to assess the significance of this finding.

 

Which test is appropriate for differences between TTRs?

 

Best

Chris 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111120/cf5ed630/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list