[Corpora-List] traditional chinese
Daniel Zeman
zeman at ufal.mff.cuni.cz
Fri May 6 14:39:34 UTC 2011
Hi Stefan,
the Academia Sinica treebank (used also in CoNLL-X and CoNLL 2007 shared
tasks data sets) comes from Taiwan and thus it contains the traditional
characters.
Hope this helps
Dan
Dne 6.5.2011 16:12, Stefan Bordag napsal(a):
> Dear all,
>
> I have been doing corresponding google searches but nothing clear
> comes out of the murky waters of the internet... Is there some corpus
> of traditional chinese to be had, be it under a commercial or free
> license?
> Or for the lack of it, at least a tool that can tokenizse traditional
> chinese into words? I am aware of the existing tools for simplified
> chinese such as IK Analyzer - and I know that they would likely work
> from traditional chinese as well, provided some word lists - which
> leads me to the first question.
>
> Thank you in advance,
> Stefan Bordag
>
--
RNDr. Daniel Zeman, Ph.D.
ÚFAL MFF, Univerzita Karlova, Praha
http://ufal.mff.cuni.cz/~zeman/
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list