[Corpora-List] Constructing a parallel Arabic-English corpus that can be freely distributed without cost

Francis Bond bond at ieee.org
Mon Dec 16 01:39:43 UTC 2013


G'day,


On Mon, Dec 16, 2013 at 9:03 AM, Darren Cook <darren at dcook.org> wrote:

> > we struggled with the same problem, and ended up using a couple of
> > sources that you may be interested in. ...
> > ... (b) other people (well us anyway) are also
> > annotating them in various languages (we are in the process of sense
> > annotating Chinese, English, Indonesian and Japanese versions of
> > these, and are slowly collaborating with others for German, Spanish,
> > Thai, ..).
>
> Francis, Are your annotated versions being released under a liberal
> license? (Is anything downloadable yet?)
>

Yes, CC BY. (Not quite yet, the segmented stuff is all ready, the sense
annotated not quite, but I am off for Christmas).  Early next year, ...

> The main disadvantage is that this is still not so much text...
>
> I'd rather have one professionally translated and tagged article to the
> whole of Wikipedia run through Google Translate. They are the reliable
> rocks you can anchor your algorithms to :-)
>

I am happy if I can have both :-)


> Darren
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131216/67eb150c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list