[Corpora-List] Constructing a parallel Arabic-English corpus that can be freely distributed without cost

Darren Cook darren at dcook.org
Mon Dec 16 01:03:33 UTC 2013


> we struggled with the same problem, and ended up using a couple of 
> sources that you may be interested in. ...
> ... (b) other people (well us anyway) are also
> annotating them in various languages (we are in the process of sense
> annotating Chinese, English, Indonesian and Japanese versions of
> these, and are slowly collaborating with others for German, Spanish,
> Thai, ..).

Francis, Are your annotated versions being released under a liberal
license? (Is anything downloadable yet?)

> The main disadvantage is that this is still not so much text...

I'd rather have one professionally translated and tagged article to the
whole of Wikipedia run through Google Translate. They are the reliable
rocks you can anchor your algorithms to :-)

Darren

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list