[Corpora-List] ACL Corpus with extracted and cleaned full-text

Stephan Oepen oe at ifi.uio.no
Mon Nov 25 22:14:38 UTC 2013


hi christian,

> I am looking for an ACL Anthology corpus which contains the extracted
> full-texts of ACL papers (for example as textfile or xml file).

please see the following reference for a summary of a
2012 community effort in this direction:

  http://aclweb.org/anthology//W/W12/W12-3210.pdf

the paper provides access information for two sets of
full-text documents, including some logical structure,
extracted from large parts of the ACL Anthology:

  http://www.delph-in.net/aac

we are aware of many remaining issues, but this may
be a useful starting point for you, nevertheless?

best wishes, oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++    --- oe at ifi.uio.no; stephan at oepen.net; http://www.emmtee.net/oe/ ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list