Corpora: non-english corpora

jre at comp.leeds.ac.uk jre at comp.leeds.ac.uk
Thu Jun 7 15:03:51 UTC 2001


Dear list members

I wrote on June 1st:

>I am holding out my begging bowl again!  I am trying to find non-english
>PoS-TAGGED corpora, which can be a little as a few thousand words.  I am ideally looking for
>such languages as Arabic, Hindi, Russian, Basque, Spanish, Vietnamese, Latin and even Sanskrit. > Any of these or similar would be most welcome.


I have had some very good responses and will be posting my thanks etc soon. In the interim does
anyone know or have in their power to grant me, access to any of the following or their closely
related family group members:
Vietnamese, Tamil, Hausa, Malay, Gaelic, Greek, Japanese, Russian or any of the North American
Indian languages.

..still hopeful

John

********************************************************
John Elliott
Centre for Computer Analysis of Language and Speech
University of Leeds
email: jre at scs.leeds.ac.uk
phone: 0113 233 6827
Web-site http://www.scs.leeds.ac.uk/jre
********************************************************



More information about the Corpora mailing list