[Corpora-List] Summary: corpora with annotated information structure

Michael Goetze goetze at kronos.ling.uni-potsdam.de
Thu Dec 19 10:03:22 UTC 2002


Dear all!

On 19th November I asked for help for finding corpora with annotated information
structure (IS) and approaches to IS-annotation - that's what I found out until now:

1) Corpora with annotated IS and attempts to annotate it

- The Prague Dependency Treebank is annotated with IS in the framework of
Functional Generative Description (FGD) in the tradition of the Prague School.
See e.g.:
* E. Buranova, E. Hajicova, P. Sgall(2000): "Tagging of Very Large Corpora:
Topic-Focus Articulation."
In: Proceedings of Coling 2000, pp. 278-284, Saarbrcken, Germany, Prague

- Yovka Tisheva & Marina Dzhonova(2002) discuss the annotation with the feature
[focus], which they apply to clitic left dislocation and to occurences of the
clitic '-li' in texts of the Bulgarian Treebank (www.bultreebank.org).
* Yovka Tisheva & Marina Dzhonova(2002): "Information Structure Level in
TreeBanks". In: Proceedings of "Treebanks and Linguistic Theories 2002" in
Sozopol, Bulgaria. (http://www.bultreebank.org/Proceedings.html)


- In Saarbruecken a project called 'MULI' has just started aiming at the
IS-annotation of treebanks in various languages. There is no public website at
the moment, but as soon as there is one there should be a link at the webpage of
Silvia Hansen (http://www.coli.uni-sb.de/~hansen/).



2) approaches towards manual or automatic annotation of information structure


* Ivana Kruijff-Korbayov and Geert-Jan Kruijff(2002): "Informativity Zoning:
Robust Annotation of Informativity in Corpora" unpubl. poster.

* Maria Wolters(1998). "Linguistic Annotation of Two Prosodic Databases".
 in: Proceedings of the Workshop on Recent Advances in Corpus Annotation, ESSLLI
'98, Saarbrücken


-> automatic recognition of information structure:

(German)
* Steinberger, Bennett (1994): "Automatic Recognition of Theme, Focus and
Contrastive Stress"


(English)
* Nobo Komagata (2000):"Identifying Information Structure in Expository Texts"
(citeseer.nj.nec.com/334424.html", see his diss. as well)

* Hoffman, Beryl (1996): "Translating into Free Word Order Languages". In:
Proceedings of the International Conference on Computational Linguistics (COLING)


(Czech)
* E. Buranova, E. Hajicova, P. Sgall(2000): "Tagging of Very Large Corpora:
Topic-Focus Articulation." In: Proceedings of Coling 2000, pp. 278-284,
Saarbrcken, Germany, Prague



Thanks for help go to:
William Mann, Joel Tetreault, Anke Luedeling, John Fry, Ralf Steinberger, Arno
Erpenbeck, Sylvain Loiseau, Silvia Hansen, Nobo Komagata, Marina Dzhonova,
Alexiei Dingli, Daniela Kurz, Maria Wolters, Michael Strube, ...

a nice winter for all of you!

Michael



More information about the Corpora mailing list