Corpora: Relatve text length
spela vintar
vintar at dfki.de
Wed Apr 24 13:49:25 UTC 2002
Hi Andrew,
for Eastern-European languages you can compare the lengths of Orwell's 1984
and its translations that were collected within the Multext-East project.
The original Multext project (http://www.lpl.univ-aix.fr/projects/multext/)
should provide the same for English, German, French, Spanish etc., however I
wasn't able to find it on their homepage at first glance...
Best,
Spela
http://nl.ijs.si/ME/CD/docs/mte-d21f/node8.html
//////////////
...
Below we give an estimate for the number of words, by language. The
wordcounts were produced by removing the SGML tags from the texts and then
using a 'wc'-like procedure.
English
104.302
Romanian
101.460
Slovene
91.619
Bulgarian
87.235
Czech
80.366
Hungarian
81.147
Estonian
79.334
Andrew Bredenkamp wrote:
> Hello everyone,
>
> Does anyone know where I can find a list of relative text length?
>
> Taking one language as an index (100), I would like a list of the (other)
> main European languages - e.g. (made up):
>
> Spanish: 100
> English: 105
> French: 110
> German: 85
>
> ... etc.
>
> Thanks a lot in advance for any help you can give me.
>
> Cheers,
> Andrew
> =========================================
> Andrew Bredenkamp
> acrolinx GmbH
> URL: www.acrolinx.com
>
> =========================================
More information about the Corpora
mailing list