[Corpora-List] Open source multilingual syntactic parser

pablo gamallo pablo.gamallo at gmail.com
Sat Nov 28 22:43:04 UTC 2009


Thanks, Linas, for your comments and suggestions. I try to reply to
some of your questions below:


>Estase citando Linas Vepstas <linasvepstas at gmail.com>:

> 2009/11/27 pablo gamallo <pablo.gamallo at usc.es>:
>>
>> DepPattern is available with GPL license at:
>> http://gramatica.usc.es/pln/tools/deppattern.html
>
> Thanks!
>
> A quick glance suggests that this parser is generating
> dependencies that are similar to, but different from those
> of other dependency parsers.   Is there any effort anywhere
> to  standardize on the set of dependencies generated?

The toolkit DepPattern is provided, not only with specific parsers,
but with a tutorial to write formal grammars that are compiled into
parsers. Names of dependencies are declared into a configuration file:
“dependencies.conf”. So, you can define in this file whatever set of
dependency labels to be used to write the grammar rules..


> I maintain a rule-based dependency parser (RelEx) and
> recently added a "Stanford Parser compatibility mode"
> because the RelEx dependencies are slightly different,
> and, because from an engineering standpoint, compatibility
> is something that users like.  (And, yes, I actually learned a lot
> by looking at how these two systems differed.)
>
> I wrote up what I found here:
>
> http://opencog.org/wiki/Dependency_relationship
>
> which describes RelEx, and how it differs from the
> Stanford parser (and from MiniPar)

Thanks for the link!


> I would be vaguely interested in creating a "DepPattern"
> compatibility mode, if that was the right thing to do --
> is it?  But perhaps it would be better if all dependency
> parsers moved to a common set of dependencies and
> feature sets?

Thanks for your interest! A common set of labels and features would be
useful to build further applications (Information Extraction,
Question-Answering...), based on standarized dependencies.
Yet, as you say in your wiki, parser outputs differ in more deeply
ways than just dependency labeling. For instance, your RelEx system
aims at grasping the semantic content of sentences, and not just a
literal syntactic structure. Using the formalism of DepPattern, it is
possible to write either more syntactic-oriented grammars or more
semantically motivated rules (as in Constraint Grammar and Link
Grammar, I guess). For instance, with DepPattern formalism,
you have the choice of generating a prepositional object and a
prepositional complement, or to collapse both of these into a single
prepositional relation, with the preposition linking the head and the
modifier/object. Following your example, the expression “go to the
store” can be analyzed either:

pobj (go, store)
pcomp (go, to)

or:

to(go, store)

I think a tricky task the research community should to define is the
following: given a particular NLP application (word similarity
extraction, question-answering...), what type of dependencies are
required to improve the application's results?



> Is there a more detailed description of DepPattern's
> dependency output? It is hinted at in section 1.8.1 of
> the user guide -- features, such as lemma, number,
> person, tense, genre, possessor, politeness, type -- the
> first 4 I can guess, the last 4 are ???


Up to now, the tutorial (http://gramatica.usc.es/pln/tools/tutorialGrammar.pdf)
(not the user_guide) is the more accurate description of the formalism
and the output of the parser. Morpho-syntactic features are described
in section 1.3.2. They are based on those used by FreeLing, which are
based in turn on the Eagles project.


All the best,
Pablo

Pablo Gamallo Otero
Departamento de Língua Espanhola
Faculdade de Filologia
Campus Universitário Norte
15782 Santiago de Compostela
Espanha / Spain

phone: (+34) 981 563100, ext. 11761
Fax:   (+34) 981 574646
pablo.gamallo at usc.es
http://gramatica.usc.es/~gamallo/

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list