Seminaire: Karin Harbusch, Expose sur les ellipses, 27 octobre 2012, Paris

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Tue Oct 23 19:58:54 UTC 2012

Date: Tue, 23 Oct 2012 15:00:20 +0200
From: Anne Abeillé <anne.abeille at>
Message-Id: <FEFB727A-4D3D-42F8-8EC5-8D4E60764662 at>

Dans le cadre du projet
Approches typologiques des constructions elliptiques (Fédération TUL du

nous avons le plaisir d'accueillir le vendredi 27 octobre
de 10h a 12h
175 rue du chevaleret, 75013 Paris
4e etage, aquarium
l'exposé suivant:

ELLEIPO: Generating Clausal Coordinative Ellipsis in
Dutch, Estonian, German, and Hungarian

Karin Harbusch
Computer Science Dept., University of Koblenz-Landau, GERMANY
harbusch at

In our talk, we present target-language independent syntactic rules to
generate Clausal Coordinate Ellipsis (CCE), i.e. Gapping (including
Long-Distance Gapping, Subgapping and Stripping), Forward and Backward
Conjunction Reduction (FCR and BCR) and Subject Gap with Finite/ Fronted
Verb (SGF). The CCE rules, which are inspired by the psycholinguistic
theory by Kempen (2009), have been implemented in Java (cf. system
ELLEIPO) so that tests for a new target language require the set up of
syntactic trees to be read in by the system. All CCE paraphrases for any
input sentence—provided as output by the ELLEIPO system—have to be
inspected by native speaker with respect to overgeneration, i.e.  does
the list contain any ungrammatical sentence, and undergeneration, i.e.
does the list lack any CCE paraphrase that is licensed in the currently
investigated target language. We show the implementation for Dutch and
German, two Indo-European languages, and for Estonian and Hungarian, two
Finno-Ugric languages.

With respect to incremental production of ellipsis, we present results
from four different corpus studies. After an account of our data
extraction method, we will present a detailed overview of the incidence
of four types of clausal coordinate ellipsis in the spoken and written
treebanks in Dutch (ALPINO and CGN 2.0) and German (TIGER and
VERBMOBIL). Based on the deviating numbers for the individual CCE types,
we propose a theoretical explanation of the data pattern based on the
assumption that during spontaneous speaking the scope (“window”) of
online grammatical planning is basically restricted to one (finite)
clause. In producing clausal coordinations, checking the possibility of
“forward” ellipsis (Gapping, Forward Conjunction Reduction) requires
comparison of form and meaning of two adjacent clauses. As this
overtaxes the online planning scope of the sentence production system,
speakers prefer to plan the form of second or later conjoined clauses in
isolation, that is, without taking the shape of preceding clauses into
account and thereby eliminating elliptical options. RNR, the “backward”
versions of coordinate ellipsis, is more severely affected in spoken
language because it requires the simultaneous presence within the
planning window of (nearly) two complete clauses. Indeed, whilst RNR is
readily observable in written texts, in spoken language it is a rare
phenomenon manifesting itself only in very short clauses.

