[Corpora-List] traditional chinese

Daniel Zeman zeman at ufal.mff.cuni.cz
Fri May 6 14:39:34 UTC 2011


Hi Stefan,

the Academia Sinica treebank (used also in CoNLL-X and CoNLL 2007 shared 
tasks data sets) comes from Taiwan and thus it contains the traditional 
characters.

Hope this helps
Dan

Dne 6.5.2011 16:12, Stefan Bordag napsal(a):
> Dear all,
>
> I have been doing corresponding google searches but nothing clear 
> comes out of the murky waters of the internet... Is there some corpus 
> of traditional chinese to be had, be it under a commercial or free 
> license?
> Or for the lack of it, at least a tool that can tokenizse traditional 
> chinese into words? I am aware of the existing tools for simplified 
> chinese such as IK Analyzer - and I know that they would likely work 
> from traditional chinese as well, provided some word lists - which 
> leads me to the first question.
>
> Thank you in advance,
> Stefan Bordag
>

-- 
RNDr. Daniel Zeman, Ph.D.
ÚFAL MFF, Univerzita Karlova, Praha
http://ufal.mff.cuni.cz/~zeman/


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list