[Corpora-List] Looking for a corpus of Tibetan
Gertrud Faasz
faaszgd at ims.uni-stuttgart.de
Mon Sep 27 09:53:32 UTC 2010
Dear corpora members,
we should like to do some tests of a new word segmentation tool (for Dzongkha, the national language of
Bhutan with a syllabic alphabet and no word boundary markers) on similar languages for comparison reasons.
Therefore, we are looking for a test corpus of Tibetan. The words should already be segmented. Domain
plays no role. The size should be at least 50000 tokens. We are also interested in word-segmented corpora
for other languages with similar characteristics (i.e. languages using an alphabetic script without
word boundaries).
Thanks a lot in advance for all suggestions,
Helmut Schmid & Gertrud Faaß
IMS, University of Stuttgart
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list