[Corpora-List] Comparing n-grams / authorship

Martin Mueller martinmueller at northwestern.edu
Wed Apr 18 02:39:54 UTC 2012


I'm not a statistician by any means, but something is going on here. Have
a look at 
http://scalablereading.org/2011/04/08/did-nicolas-udall-write-the-history-o
f-jacob-and-esau/, which discusses shared n-grams in texts of similar (if
larger) scale in the context of early modern drama.



On 4/17/12 2:47 PM, "Mark Davies" <Mark_Davies at byu.edu> wrote:

>I am sending the following question on behalf of a colleague at BYU.
>Thanks in advance for any suggestions you have; I'll forward them to the
>researcher who is working on this problem.
>
>Mark Davies, BYU
>
>-------------------------------------------
>
>
>I am working with a 250,000 word text. Within this text there are two
>chapters, A and B (1,200 and 2,400 words respectively). The authorship of
>these two chapters is unknown, but we have reason to believe to that the
>author(s) of A and B have a relationship that is different from the
>majority of the rest of the book. There are two 4-grams, three 6-grams,
>one 7-gram, one 8-gram, and one  9-gram shared in common in chapters A
>and B that appear nowhere else in the book. Intuitively it seems like
>there is a unique relationship between chapters A and B.
>
>The question is:
>
>Is there a statistical method of measuring whether the types of n-grams
>above establish a reasonable probability that the two texts are linked.
>_______________________________________________
>UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>Corpora mailing list
>Corpora at uib.no
>http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list