[Corpora-List] Survey: applications using grammar-based parsers

Tue Mar 31 22:08:04 UTC 2009

On 2009-03-31, Trond Trosterud <trond.trosterud at uit.no> wrote:

> Here comes a summary of the answers to the query.

Thank you for summarizing.  Your query caught my interest since we may
have a need for an embeddable parser in our project on linguistically
supported editing <http://www.lingured.info/>.

Unfortunately, it seems that when you do not just want the performance
numbers (cited in papers) but the actual, working system, it frequently
turns out that it is not available (dead project, results locked away,
commercial, etc.) or unusable in an application (too slow, not
embeddable, etc.).  At least our experience wrt. morphologic analysis
and generation for German has been quite sobering.

[...]

> Two of the feedbacks refer to commercial systems (langos, the  
> rulebased MT systems, one reviewer also referred to the commercial CG- 
> based company Connexor). Whereas being commercial is in itself a  
> strong indication of good results (customers will not accept  
> malfunction),

I think this is pretty optimistic.  I don't know anything about the
systems you mentioned, and I don't want to discredit anybody, but in
general, being commercial is only a strong indication of marketing and
business skills.  It does not say anything about quality, neither good
nor bad.  One has to keep in mind that the average customer (including
corporate custemers) is very tolerant wrt. quality problems in software,
especially in markets with little competition.

> it also makes it hard to evaluate them: For commercial reasons, their
> source code, or even in some cases the (methodology behind their)
> approach, is kept confidential. Nothing more can thus be said about
> them here.

And this is a big problem.  My experience is that these trade-secret
approaches are rarely unique.  Quite often, it's just too messy to show
to anybody...

> When I look at the other three parsers, GTA, WCDG, and Link grammar, I  
> find that they all bear some reseamblances to the CG framework: The  
> parsing is based upon bottom-up local relations (looking at the  
> relations the words may have to each other), and they are thus always  
> able to come up with an an analysis.

[...]

> These frameworks are missing in my survey, as I also suspected. What I  
> had expected was to see some LFG and HPSG version of iCALL programs,  
> as the language in pedagogical QA systems may be restricted, thereby  
> conpensating for weaker results for unbounded text, but then, these  
> parsers would have been excluded by my first criterion. In order to  
> analyse unbounded text reliably it thus seems that a framework with  
> the properties of the 4 approaches referred to here is needed. That  
> fst systems are successful for morphology but not for syntax I see as  
> a healthy reminder of the difference between these two domains.

Speaking of frameworks, I might mention the Malaga system
<http://home.arcor.de/bjoern-beutel/malaga/>.  Unfortunately, no larger
syntax grammar is publicly available, but it provides a uniform
framework for morphology and syntax, and it can easily be embedded into
applications.

Greetings from Switzerland

-- 
Michael Piotrowski, M.A. <mxp at cl.uzh.ch>
Institute of Computational Linguistics, University of Zurich
Phone +41 44 63-54394 | OpenPGP public key ID 0x1614A044

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora