[Corpora-List] The meta book

John F. Sowa sowa at bestweb.net
Mon Dec 14 13:06:14 UTC 2009


I came across the following article from a news item on the BBC,
but I couldn't find any mention of it on Corpora List:

    http://arxiv.org/PS_cache/arxiv/pdf/0909/0909.4385v1.pdf
    The meta book and size-dependent properties of written language

Following is the summary from the BBC with some comments from
an interview with the first author:

    http://news.bbc.co.uk/2/hi/science/nature/8404025.stm
    Rare words 'author's fingerprint'

The authors are physicists, and they published the article in
a physics journal.  I wondered how it compares to other studies
by people on this list.

Following is the abstract.

John Sowa
__________________________________________________________________

The meta book and size-dependent properties of written language

Authors: Sebastian Bernhardsson, Luis Enrique Correa da Rocha,
Petter Minnhagen

New J. Phys. 11 (2009) 123015

Abstract: Evidence is given for a systematic text-length dependence of 
the power-law index gamma of a single book. The estimated gamma values 
are consistent with a monotonic decrease from 2 to 1 with increasing 
length of a text. A direct connection to an extended Heap's law is 
explored. The infinite book limit is, as a consequence, proposed to be 
given by gamma = 1 instead of the value gamma=2 expected if the Zipf's 
law was ubiquitously applicable. In addition we explore the idea that 
the systematic text-length dependence can be described by a meta book 
concept, which is an abstract representation reflecting the 
word-frequency structure of a text. According to this concept the 
word-frequency distribution of a text, with a certain length written by 
a single author, has the same characteristics as a text of the same 
length pulled out from an imaginary complete infinite corpus written by 
the same author.


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list