[Corpora-List] trying to parse crossing dependencies
Yannick Versley
versley at sfs.uni-tuebingen.de
Mon Oct 6 08:28:33 UTC 2008
Hi,
> I always wondered about how do you parse trennbare Verben in German,
> which to me, even if not the same kind of problem, does relate to
> "crossing dependencies" somehow
Parsing separable verbs, but also topicalized objects in German, or relative
clauses in most languages, is usually difficult and people have come up with
different solutions which all somehow work but don't really look completely
satisfying.
* You can just ignore the problem by making everything projective and sort out
everything after your parser has done the grunt work. This is the common
approach when people do PCFG parsing on e.g. the Tiger treebank - they just
reattach the offending nodes to a higher level and pretend nothing has
happened.
The MALT dependency parser (Nivre&Nilsson 2005) reattaches edges and puts
different labels on them so a post-processing step can reattach things to
where they came from. This is more honest than the former since, unlike using
PARSEVAL metrics on the transformed structures, you can actually compare the
parse+postprocessing result directly to what you aimed at.
* There is the biting-the-bullet approach of just doing non-projective parsing
by either solviong a harder problem or doing a coarser approximation.
The WCDG approach (Menzel&Schröder 1998, Foth/Daum/Menzel 2004) simply puts
nonprojective parsing into a weighted constraint satisfaction problem, where
you can posit arbitrary constraints on parse trees. This usually means that
efficiently solving this problem requires a considerable engineering effort.
A method using a more general-purpose constraint server can be found in
(Riedel&Clarke, 2006)
A slightly less ambitious version of this would be (McDonald&Pereira 2006),
who first use a projective parsing algorithm to find a reasonable parse and
then use hill-climbing to re-attach non-projective edges.
And there's also an approach of using an O(n^2) spanning tree algorithm to
directly get a non-projective dependency tree, as in (McDonald&Pereira 2005).
The obvious problem of this approach is that you cannot represent dependencies
between edges; so if you want to keep the algorithm not to attach two
determiners or two subject to one word, you have to do this with approximate,
indirect means (aka "some fumbling required").
There are people who try to base their formalism on parsing algorithms that
cover *exactly* the class of mildly-context-sensitive grammars, which are
parsable in O(n^6) worst case time [which is still dreadful, but sounds much
nicer than the exponential worst-case in the case of general constraint
satisfaction]. See e.g. (Lichte&Kallmeyer 2008) for one approach in German.
For handling separable verbs, there are different approaches:
* one would be to get an approximate idea of clause structure (through clause
chunking) and gather the parts of the separable verbs to put them together
and look them up. This is quite obviously a 90% solution, but it works well
enough in cases where you want the verb lemma, but can't be bothered to do
full parsing.
* In the WCDG approach, you *can* have a constraint that looks at the verb
prefix and does the right thing. For practical reasons (as far as I know),
the handling of verb valencies is implemented approximately as a kind
of "this verb can have a dative object when it occurs without a prefix, and
an accusative and/or a dative object").
* In HPSG implementations (e.g. Müller 2002
http://hpsg.fu-berlin.de/~stefan/Pub/paradox.html for an overview)
usually treat the separable prefix as an argument to the verb, which means
that there are lots of different lexicon entries for one surface verb form
(i.e., "hat" has lexicon entries for vor#haben, an#haben, auf#haben, etc.)
Separable prefixes are a different class of problems than the worst
non-projectivities (the ones which make the mild context sensitivity
necessary), however, since they can be reasonably treated with a (non-Xbar)
PCFG grammar.
Best wishes,
Yannick Versley
--
Yannick Versley
Seminar für Sprachwissenschaft, Abt. Computerlinguistik
Wilhelmstr. 19, 72074 Tübingen
Tel.: (07071) 29 77352
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list