[Corpora-List] Data: sentential revisions from different versions of the same paper

Chenhao Tan chenhao at cs.cornell.edu
Thu May 15 11:54:26 UTC 2014


Data: 100K significant sentential revisions from different versions of the same paper,  extracted from the arXiv.
A subset is labeled with whether the revision strengthened or weakened the claim.
Example:
s1: "...circadian pattern and burstiness in human communication activity"
vs.
s2: "...circadian pattern and burstiness in mobile phone communication",

or
s1: "The algorithm is studied in this paper"
vs.
s2:"The algorithm is proposed in this paper".

Short paper at ACL 2014
A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication
Chenhao Tan, Lillian Lee
http://chenhaot.com/pages/statement-strength.html


--
Chenhao Tan (谭宸浩)
PhD Candidate
Department of Computer Science, Cornell University
http://chenhaot.com
413 Gates Hall

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140515/843b6a0d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list