[Corpora-List] ACL Anthology Searchbench: new release

Ulrich Schaefer ulrich.schaefer at dfki.de
Thu Feb 9 19:27:52 UTC 2012


Dear colleagues,
we are happy to announce a new release of the ACL Anthology Searchbench 
<http://aclasb.dfki.de>, a public service that combines 
sentence-semantic, full-text and bibliographic search in the ACL 
Anthology (http://take.dfki.de/#Systems).

*New highlights* (your feedback via the button at the left bottom of the 
Searchbench start page is appreciated!):

The Searchbench now indexes over 22,500 CL & LT papers including the so 
far missing journal articles and conferences from 2011, past LREC 
proceedings from 2000--2010, and many more.

 From now on, we'll be able to update the index shortly after new papers 
have been added to the ACL Anthology.

*Graphical citation browser*.
In the Citations tab in the Searchbench's document view, there is now a 
graphical citation browser (sample link 
<http://aclasb.dfki.de/CitationBrowser.html#id=W11-2927>, full HD screen 
recommended ;-) ). It uses ParsCit, ACL Anthology Network data and 
sentence information from the Searchbench. You can click on the labeled 
edges or right mouse button on the document nodes to see the citation 
sentences in context and highlight them in PDF. A tentative link to 
external public scientific search services is generated in case a cited 
paper or book is not in the Anthology.

*Bibliographic metadata*.
At the same place (from the Citations tab), you can also inspect and 
copy bibliographic metadata for each Anthology paper
- in rich text (roughly ACL citation style), and
- as bibtex with mostly correct LaTeX character encoding
(example <http://aclasb.dfki.de/nlp/bib/J11-3002>). Because bibtex is 
missing for many papers in the Anthology, we generated it from the 
Anthology index files.
Page numbers were taken automatically from the paper layout where 
possible, e.g. for many CL journal articles.
We are collaborating with the other groups working on the Anthology and 
hope to be able to provide even more complete and corrected metadata later.

*Online glossary extraction*.
You could use the Searchbench as an online *glossary extraction tool*. 
Simply try a semantic statements query of the form s:<term> p:is -- 
example: dependency parsing 
<http://aclasb.dfki.de/#stm%7EsNC%7Cs%3Adependency%20parsing%20p%3Ais%2A>).

*ACL-2012 Contributed Task*.
Finally, let us draw your attention to the *Contributed Task 
<http://translit.i2r.a-star.edu.sg/r50/taskintro/> *that is part of the 
ACL-2012 Special Workhop <http://translit.i2r.a-star.edu.sg/r50/>.
We provide the Searchbench's paperxml data for this. The goal of the 
Contributed Task is to generate improved, high quality rich text (XML) 
versions of all Anthology papers as a free corpus for further research, 
e.g. in summarization, parsing, citation analysis, etc.

Cheers,

Ulrich and Christian

-- 
Ulrich Schaeferhttp://www.dfki.de/~uschaefer
Christian Spurkhttp://www.dfki.de/~cspurk/  <http://www.dfki.de/%7Ecspurk/>

DFKI Language Technology Lab, D-66123 Saarbruecken, Germany
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
eschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender), Dr. Walter Olthoff. Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes. Amtsgericht Kaiserslautern, HRB 2313

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120209/1676859e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list