span?

Daniel Marcu marcu at ISI.EDU
Mon Aug 25 15:18:35 UTC 2003


Finding a definition that can be applied consistently by many human
judges is extremely tricky. In order to achieve statistically
significant levels of agreement between humans on the task of span
annotation, one needs as clearly specified instructions as possible,
lots of patience, and many hours of training with the human annotators.

http://www.isi.edu/~marcu/discourse/ provides a link to an RST tagging
manual that we developed over several iterations/years. A large corpus
of annotated texts, which was created using the instructions in this
manual, can be obtained for a nominal fee from the Linguistic Data
Consortium
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T07)

Regards,
Daniel Marcu


John Bateman wrote:
>
> > Could someone explain to me exactly how a 'span' is determined? Is it
> > a grammatical unit? A pragmatic unit?
>
> It is an *analytic* unit defined internally to RST. That is, you decide
> on your
> criteria for segmentation (as in all linguistic endeavors), and then use
> this
> for your spans. Then you see if you can make an RST analysis work.
>
> A common choice is to make spans co-incident with clauses (then it
> *corresponds* to a grammatical unit, but it is not itself a grammatical
> unit as it belongs to a different level of description).
>
> In some work, it goes below clauses to phrases (e.g., in German where
> what would be a subordinating clause in English happily appears as
> a prepositional phrase and it seems a pity to have completely different
> RST analyses).
>
> It could be bigger units, such as paragraphs if you want less granularity.
>
> I am not sure what a "pragmatic unit" is and so have no comments.
>
> Best,
> John Bateman.

--
Daniel Marcu
Information Sciences Institute of University of Southern California
4676 Admiralty Way, Suite 1001; Marina del Rey, CA 90292-6601
Voice: 310-448-8726; Fax: 310-822-0751; http://www.isi.edu/~marcu/



More information about the Rstlist mailing list