[Corpora-List] List Parallel Corpora with Cronological data

Paulo Malvar paulomal at gmail.com
Tue Jul 15 17:33:36 UTC 2008


The Software Localization English-Galician Parallel Corpus was compiled
during summer 2007 and was released under GPL in January 2008. It is
composed of Open Source (linux distributions and applications) software
localization translation units and it currently has 5,163,524 Part-of-Speech
tagged and lemmatized tokens (2,535,405 Eglish tokens and 2,628,119 Galician
tokens). It can be found at:
http://paulomalvar.dyndns.org:8080/Paulo_Malvar_personal_webpage/Resources.html


Best regards,

Paulo Malvar Fernández

-- 
Paulo Malvar Fernández

Research Assistant of the SDSU Computational Linguistics Lab

http://paulomalvar.homeunix.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080715/52e4df79/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list