<div dir="ltr">CALL FOR PARTICIPATION<br><br>BiomedSumm: Shared task on Biomedical Summarization<br>at the Text Analysis Conference (TAC 2014)<br><br>November 17-18, 2014<br><br><a href="http://www.nist.gov/tac/2014/BiomedSumm/">http://www.nist.gov/tac/2014/BiomedSumm/</a><br>
<br><br>INTRODUCTION<br><br>Since 2001, the US National Institute of Standards and Technology<br>(NIST) has organized large-scale shared tasks for automatic text<br>summarization within the Document Understanding Conference (DUC) and<br>
the Summarization track at the Text Analysis Conference (TAC).<br>However, while DUC and TAC generated a wealth of evaluation resources<br>for news summarization, far less material is available to support<br>development of methods of automatic summarization in other domains<br>
where there is also a pressing need for distillation and management of<br>complex information presented in vast amounts of text.<br><br>Today, finding an overview of specific developments in biomedicine<br>requires painstaking work. The existence of surveys tells us that such<br>
information is desirable, but such surveys require considerable time<br>and human effort, and cannot keep up with the rate of scientific<br>publication. For example, papers are added to PubMed alone at the rate <br>of about 1.5 articles per minute, precluding the possibility of manual <br>
summarization of the scientific literature.<br><br>The goal of the TAC 2014 Biomedical Summarization track (BiomedSumm)<br>is to develop technologies that aid in the summarization of biomedical<br>literature.<br><br>You are invited to participate in BiomedSumm at TAC 2014. NIST will<br>
provide test data for the shared task, and participants will run their<br>NLP systems on the data and return their results to NIST for<br>evaluation. TAC culminates in a November workshop at NIST in<br>Gaithersburg, Maryland, USA.<br>
<br>All results submitted to NIST are archived on the TAC web site, and<br>all evaluations of submitted results are included in the workshop<br>proceedings. Dissemination of TAC work and results other than in the<br>workshop proceedings is welcomed, but the conditions of participation<br>
specifically preclude any advertising claims based on TAC results.<br><br><br>SHARED TASK<br><br>There are currently two ways in which scientific papers are usually<br>summarized: first, by the abstract that the author provides; second,<br>
when a paper is being cited, a brief summary of pertinent points in<br>the cited paper is often given. However, both of these methods fall<br>short of addressing the reader's needs, which are: for the abstract,<br>to know what the lasting influence of a paper is; for references, to<br>
know how the author originally expressed the claim.<br><br>The set of citation sentences (i.e., "citances") that reference a<br>specific paper can be seen as a (community created) summary of that<br>paper (see e.g. [1,2]). The set of citances is taken to summarize the<br>
key points of the referenced paper, and so reflects the importance of<br>the paper within an academic community. Among the benefits of this<br>form of summarization is that the citance offers a new type of context<br>that was not available at the time of authoring of the citation:<br>
often, in citation, papers are combined, compared, or commented on -<br>therefore, the collection of citations to a reference paper adds an<br>interpretative layer to the cited text.<br><br>The drawback, however, is that though a collection of citances offers<br>
a view of the cited paper, it does not provide a context, in terms of<br>data or methods, of the cited finding; if the citation is of a method,<br>the data and results may not be cited. More seriously, a citing author<br>
can attribute findings or conclusions to the cited paper that are not<br>present, or not intended in that form (e.g., the finding is subject to<br>specific experimental conditions which are not cited). To provide more<br>
context, and to establish trust in the citance, the reader would need<br>to see, next to the citance, the exact span(s) of text (or tables or<br>figures) that are being cited, and be able to link in to the cited<br>text at this exact point.<br>
<br>To give the abstract-as-summary the benefit of community insight, and<br>to give the citances-as-summary the benefit of context, we explore a<br>new form of structured summary: a faceted summary of the traditional<br>
self-summary (the abstract) and the community summary (the collection<br>of citances). As a third component, we propose to group the citances<br>by the facets of the text that they refer to.<br><br>A pilot study indicated that most citations clearly refer to one or<br>
more specific aspects of the cited paper. For biomedicine, this is<br>usually either the goal of the paper, the method, the results or data<br>obtained, or the conclusions of the work. This insight can help<br>create more coherent citation-based summaries: by identifying first,<br>
the cited text span, and second, the facet of the paper (Goal, Method,<br>Result/Data or Conclusion), we can create a faceted summary of the<br>paper by clustering all cited/citing sentences together by facet.<br><br>Use Case: This form of scientific summarization could be a component<br>
of a User Interface in which a user is able to hover over or click on<br>a citation, which then causes a citance-focused faceted summary of the<br>referenced paper to be displayed, or a full summary of the referenced<br>
paper taking into account the citances in all citing papers for that<br>
reference paper. Finally, this form of scientific summarization would<br>allow a user to read the original reference paper, but with links to<br>the subsequent literature that cites specific ideas of the reference<br>paper.<br>
<br>The automatic summarization task is defined as follows:<br><br>Given: A set of Citing Papers (CPs) that all contain citations to a<br>Reference Paper (RP). In each CP, the text spans (i.e., citances)<br>have been identified that pertain to a particular citation to the RP.<br>
<br>Task 1a: For each citance, identify the spans of text (cited text<br>spans) in the RP that most accurately reflect the citance. These are<br>of the granularity of a sentence fragment, a full sentence, or several<br>consecutive sentences (no more than 5).<br>
<br>Task 1b: For each cited text span, identify what facet of the paper<br>it belongs to, from a predefined set of facets.<br><br>Task 2: Finally, generate a structured summary of the RP and all of<br>the community discussion of the paper represented in the citances. The<br>
length of the summary should not exceed 250 words. Task 2 is <br>tentative.<br><br>Evaluation: Task 1 will be scored by overlap of text spans in the<br>system output vs gold standard. Task 2 will be scored using the<br>
ROUGE family of metrics [3]. Again, Task 2 is tentative.<br><br>Data for the biomedical summarization task will come from the domain<br>of cell biology. Data will initially be distributed through a TAC<br>shared task on biomedical document summarization. It will be archived<br>
on SourceForge.net at <a href="http://tacsummarizationsharedtask.sourceforge.net">tacsummarizationsharedtask.sourceforge.net</a>.<br><br>This corpus is expected to be of interest to a broad community<br>including those working in biomedical NLP, text summarization, <br>
discourse structure in scholarly discourse, paraphrase, textual <br>entailment, and/or text simplification.<br><br><br>REGISTRATION<br><br>Organizations wishing to participate in the BiomedSumm track at TAC<br>2014 are invited to register online by June 30, 2014. Participants are<br>
advised to register and submit all required agreement forms as soon as<br>possible in order to receive timely access to evaluation resources,<br>including training data. Registration for the track does not commit<br>you to participating in the track, but is helpful to know for<br>
planning. Late registration will be permitted only if resources<br>allow. Any questions about conference participation may be sent to the<br>TAC project manager: <a href="mailto:tac-manager@nist.gov">tac-manager@nist.gov</a>.<br>
<br>Track registration: <a href="http://www.nist.gov/tac/2014/BiomedSumm/registration.html">http://www.nist.gov/tac/2014/BiomedSumm/registration.html</a><br><br><br>WORKSHOP<br><br>The TAC 2014 workshop will be held November 17-18, 2014, in<br>
Gaithersburg, Maryland, USA. The workshop is a forum both for<br>presentation of results (including failure analyses and system<br>comparisons), and for more lengthy system presentations describing<br>techniques used, experiments run on the data, and other issues of<br>
interest to NLP researchers. TAC track participants who wish to give a<br>presentation during the workshop will submit a short abstract<br>describing the experiments they performed. As there is a limited<br>amount of time for oral presentations, the abstracts will be used to<br>
determine which participants are asked to speak and which will present<br>in a poster session.<br><br><br>IMPORTANT DATES<br><br>Early May 2014: Initial track guidelines posted<br>End of May 2014: Distribution of first release of training data<br>
June 30, 2014: Deadline for registration for track participation<br>July 31, 2014: Final release of training data<br>August 11, 2014: Blind test data released<br>August 22, 2014: Results on blind test data due<br>Mid-September 2014: Release of individual evaluated results to participants<br>
October 7, 2014: Short system descriptions due<br>October 7, 2014: Workshop presentation proposals due<br>Mid-October 2014: Notification of acceptance of presentation proposals<br>November 1, 2014: System reports for workshop notebook due<br>
November 17-18 2014: TAC 2014 workshop in Gaithersburg, Maryland, USA<br>February 15 2014: System reports for final proceedings due<br><br>REFERENCES<br><br>[1] Preslav I. Nakov, Ariel S. Schwartz, and Marti A. Hearst (2004)<br>
Citances: Citation sentences for semantic analysis of bioscience text.<br>SIGIR 2004.<br><br>[2] Vahed Qazvinian, Dragomir R. Radev. 2010. Identifying Non-explicit<br>Citing Sentences for Citation-based Summarization. In Proceedings of<br>
Association for Computational Linguistics.<br><br>[3] Chin-Yew Lin (2004) ROUGE: A package for automatic evaluation of <br>summaries. Proceedings of "Text Summarization Branches Out," pp. 74-81.<br><br><br>ORGANIZING COMMITTEE<br>
<br>Kevin Bretonnel Cohen, University of Colorado School of Medicine, USA<br>Hoa Dang, National Institute of Standards and Technology, USA<br>Anita de Waard, Elsevier Labs, USA<br>Prabha Yadav, University of Colorado School of Medicine, USA<br>
Lucy Vanderwende, Microsoft Research, USA<br><br><br clear="all"><br>-- <br><div dir="ltr">Kevin Bretonnel Cohen, PhD<br>Biomedical Text Mining Group Lead, Computational Bioscience Program, <br>U. Colorado School of Medicine<br>
303-916-2417<br><a href="http://compbio.ucdenver.edu/Hunter_lab/Cohen" target="_blank">http://compbio.ucdenver.edu/Hunter_lab/Cohen</a><br><br><br><br></div>
</div>