The PDF file is attached. Happy reading. Look at the family tree.
Language Trees and Zipping
Dario Benedetto,1,* Emanuele Caglioti,1,† and Vittorio Loreto2,3,‡

1“La Sapienza” University, Mathematics Department, Piazzale Aldo Moro 5,
00185 Rome, Italy
2“La Sapienza” University, Physics Department, Piazzale Aldo Moro 5,
00185 Rome, Italy
3INFM, Center for Statistical Mechanics and Complexity, Rome, Italy

(Received 29 August 2001; revised manuscript received 13 September 2001;
published 8 January 2002)
In this Letter we present a very general method for extracting
information from a generic string of
characters, e.g., a text, a DNA sequence, or a time series. Based on
data-compression techniques, its
key point is the computation of a suitable measure of the remoteness of
two bodies of knowledge. We
present the implementation of the method to linguistic motivated
problems, featuring highly accurate
results for language recognition, authorship attribution, and language


