Corpora: Zipf law for different languages?

E S Atwell eric at comp.leeds.ac.uk
Tue Nov 14 13:50:09 UTC 2000


Alexander,
You ask an interesting question!  We have some empirical results showing
Zipfian distribution recurrs across a range of natural languages, and
hence could be used as a characteristic feature to look for when seeking
"language" in unknown signals, see

Elliott J, Atwell, E and Whyte B. 2000. Language identification in unknown
signals. in Proceeding of COLING'2000, 18th International Conference on
Computational Linguistics, pages 1021-1026, Association for Computational
Linguistics (ACL) and Morgan Kaufmann Publishers, San Francisco.
ISBN: 1-55860-717-X (2 volumes).

Elliott J, Atwell, E and Whyte B. 2000. Increasing our ignorance of
language: identifying language structure in an unknown signal. in
Daelemans W (ed) Proceedings of CoNLL-2000: International Conference on
Computational Natural Language Learning, Lisbon, Portugal.

Elliott J and Atwell E. 1999. Language in signals: the detection of
generic species-independent intelligent language features in symbolic and
oral communications. in Proceedings of the 50th International
Astronautical Congress, paper IAA-99-IAA.9.1.08, Amsterdam. International
Astronautical Federation, Paris.

Elliott J and Atwell E. 2000. Is anybody out there?: the detection of
intelligent and generic language-like features. In Journal of the British
Interplanetary Society, volume 53 no.1/2 pages 13-22, British
Interplanetary Society, London. ISSN: 0007-084X.

(see my www homepage for preprints of these papers)

However, we did not try to measure the variation WITHIN the set of natural
languages.  If you get any replies to your search, please copy these to us
as we would like to know too!

Good luck,

Eric Atwell

--
Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
School of Computing, University of Leeds, LEEDS LS2 9JT
TEL: (44)113-2335430  FAX: (44)113-2335468
WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric at comp.leeds.ac.uk

On Mon, 13 Nov 2000, Alexander Gelbukh wrote:

> Dear colleagues,
>
> Where can I find something about the differences in Zipf law for different
> languages or genres? Say, different exponent etc.
>
> Thank you!
> Alexander
>
> =====================================
> Prof. Dr. Alexander Gelbukh (Alexandre Guelboukh Kahn),
> Professor and researcher, head of NLP Lab.
> Lab. de Lenguaje Natural, Centro de Investigacion en Computacion,
> IPN, Av. Juan Dios Batiz s/n esq. Mendizabal, UP Adolfo L. Mateos,
> Col. Zacatenco CP 07738, Mexico DF., Mexico
> Office: (+52) 5729-6000 ext. 56544, 56518, 56602.
> Fax and Voice (answering machine): +1 (520) 441-1817 (personal).
> Shared fax: (+52) 5586-2936. Home: (+52) 5597-0709.
> gelbukh at earthling.net, gelbukh at cic.ipn.mx, www.cic.ipn.mx/~gelbukh
> =====================================
>
>
>
>



More information about the Corpora mailing list