[Corpora-List] ACL Corpus with extracted and cleaned full-text
Stephan Oepen
oe at ifi.uio.no
Mon Nov 25 22:14:38 UTC 2013
hi christian,
> I am looking for an ACL Anthology corpus which contains the extracted
> full-texts of ACL papers (for example as textfile or xml file).
please see the following reference for a summary of a
2012 community effort in this direction:
http://aclweb.org/anthology//W/W12/W12-3210.pdf
the paper provides access information for two sets of
full-text documents, including some logical structure,
extracted from large parts of the ACL Anthology:
http://www.delph-in.net/aac
we are aware of many remaining issues, but this may
be a useful starting point for you, nevertheless?
best wishes, oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ --- oe at ifi.uio.no; stephan at oepen.net; http://www.emmtee.net/oe/ ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list