[Corpora-List] ACL Corpus with extracted and cleaned full-text

Adam Kilgarriff adam at lexmasterclass.com
Tue Nov 26 15:09:05 UTC 2013


The 2009 version of the corpus is searchable at
https://the.sketchengine.co.uk/open/, also we did a bit of tidying up to
solve the problems you mention

Adam


On 25 November 2013 22:14, Stephan Oepen <oe at ifi.uio.no> wrote:

> hi christian,
>
> > I am looking for an ACL Anthology corpus which contains the extracted
> > full-texts of ACL papers (for example as textfile or xml file).
>
> please see the following reference for a summary of a
> 2012 community effort in this direction:
>
>   http://aclweb.org/anthology//W/W12/W12-3210.pdf
>
> the paper provides access information for two sets of
> full-text documents, including some logical structure,
> extracted from large parts of the ACL Anthology:
>
>   http://www.delph-in.net/aac
>
> we are aware of many remaining issues, but this may
> be a useful starting point for you, nevertheless?
>
> best wishes, oe
>
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284
> 0125
> +++    --- oe at ifi.uio.no; stephan at oepen.net; http://www.emmtee.net/oe/ ---
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for English
<http://www.webdante.com>                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131126/9e6bd53d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list