[Corpora-List] Looking for a corpus of Tibetan

Mon Sep 27 09:53:32 UTC 2010

Dear corpora members,
we should like to do some tests of a new word segmentation tool (for Dzongkha, the national language of 
Bhutan with a syllabic alphabet and no word boundary markers) on similar languages for comparison reasons. 
Therefore, we are looking for a test corpus of Tibetan. The words should already be segmented. Domain 
plays no role. The size should be at least 50000  tokens. We are also interested in word-segmented corpora 
for other languages with similar characteristics (i.e. languages using an alphabetic script without 
word boundaries).
Thanks a lot in advance for all suggestions,

Helmut Schmid & Gertrud Faaß
IMS, University of Stuttgart

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora