[Corpora-List] ACCURAT Toolkit released

Mārcis Pinnis marcis.pinnis at Tilde.lv
Thu Aug 23 11:04:25 UTC 2012


The ACCURAT project (http://www.accurat-project.eu/) is pleased to announce the release of ACCURAT Toolkit - a collection of tools for comparable corpora collection and multi-level alignment and information extraction from comparable corpora.



By using the ACCURAT Toolkit, users may obtain:

- Comparable corpora from the Web (current news corpora, filtered Wikipedia corpora, and narrow domain focussed corpora);

- Comparable document alignments;

- Semi-parallel sentence/phrase mapping from comparable corpora (for SMT training purposes or other tasks);

- Translated terminology extracted and mapped from bilingual comparable corpora;

- Translated named entities extracted and mapped from bilingual comparable corpora.



The toolkit is open source and freely available. It can be downloaded from the ACCURAT Web Site at http://www.accurat-project.eu/ under the terms of the Apache 2.0 licence.



The ACCURAT project has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248347.



=-=-=-= REFERENCES =-=-=-=





ACCURAT D2.6 2012. Toolkit for multi-level alignment and information extraction from comparable corpora. (http://www.accurat-project.eu/uploads/Deliverables/ACCURAT%20D2.6%20Toolkit%20for%20multi-level%20alignment%20and%20information%20extraction%20from%20comparable%20corpora%20v3.0.pdf).





ACCURAT D3.5 2012. Tools for building comparable corpus from the Web. (http://www.accurat-project.eu/uploads/Deliverables/ACCURAT%20D3.5%20Tools%20for%20building%20comparable%20corpus%20from%20the%20Web%20v3.0.pdf).



Pinnis, M., Ion, R., Ştefănescu, D., Su, F., Skadiņa, I., Vasiļjevs, A., & Babych, B. (2012). ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora. Proceedings of the ACL 2012 System Demonstrations (pp. 91–96). Association for Computational Linguistics. Jeju, South Korea.

---------------------
Kind regards,
Mārcis Pinnis
Researcher, Tilde
www.tilde.eu<http://www.tilde.eu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120823/25dd7adb/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list