Portability

Emily M. Bender emily.m.bender at gmail.com
Sun Mar 21 04:17:46 UTC 2010


Hi Mike,

The LKB is open source software, which means that even if the
binaries no longer run at some point, the software itself could be
recreated.  The GUI relies on CLIM, which is not open source, but
the important parts don't.  I believe that the PET system (written
in C++) is also open-source.  Both the LKB and PET interpret tdl
grammars, so you can use the same grammar with either system.
(PET is much faster as a parser, but lacks the grammar development
support that the LKB does.  The LKB also includes a generation
algorithm, which can be very useful in checking the accuracy of a
grammar, not to mention a wide variety of pratical applications.)
So, there is at least the possibility of resurrecting the software at
some later date, should someone have strong enough
motivation :)

As for working with Pashto or Inuit.  First, I don't think the language
being described has any influence on whether the beach is
attractive enough to lure Stephan for a visit to assist with
[incr tsdb()] installation.  More to the point, we have a Live CD
with the LKB and [incr tsdb()] pre-installed, so you can get the
software, already configured, here:

http://depts.washington.edu/uwcl/twiki/bin/view.cgi/Main/KnoppixLKB

In addition, you might find the Grammar Matrix customization
system useful.  It allows you to create a starter grammar in tdl
by filling out a web-based questionnaire which elicits typological
and lexical information.  Our goals are to make grammar development
faster, promote re-use of analyses across grammars where possible,
and blend typological breadth with depth of syntactic analysis.
(The grammars map to MRS semantic representations.)

http://www.delph-in.net/matrix/customize/matrix.cgi

When you are done filling out the questionnaire, you should get
a grammar that parses (and generates!) a fragment of the language
you are describing, and which you can extend by hand-editing
the tdl files.  The matrix dev group is always happy to answer
questions that arise as you work with the system.  We also
have a new feature on the site which allows you to test your
grammar by generation before even downloading it.

We assume a separation of morphophonology from morphosyntax
(see Bender and Good 2005, CLS).  Recent work in the DELPH-IN
consortium has increased the sophistication available for the
interface between a morphophonological analyzer and the LKB
(or PET), but someone else will have to provide the details on
that one.

Finally, regarding the desire to blend prose description with formal
analysis in a single document: In my experience, the grammars we
write are not organized typically easily mapped onto a linearized
prose description which reads well.  My current sense of the best
solution is to have the prose grammar and the implemented grammar
join through the examples.  To wit: all of the example sentences
in the prose grammar should be exported into a test suite, which can
then be parsed with the implemented grammar. [incr tsdb()], among
its many fine points, allows you to create a grammar-based treebank
(in the Redwoods style --- Oepen et al 2002), by selecting among
the analyses provided to each string by your grammar.  This treebank
can be distributed with the grammar as part of the language
documentation.

Finally, it is of course best practice to document your grammar code as
you are writing it, in the form of comments associated with each type in
the grammar.  Francis and colleagues have some neat software for
exporting this documentation itself into a web-browseable format (Hashimoto
et al 2005).


-- Emily

Bender, Emily M. and Jeff Good. 2005. Implementation for Discovery: A
Bipartite Lexicon to Support Morphological and Syntactic Analysis. In
Edwards, Midtlyng, Sprague and Stensrud, eds. Chicago Linguistic
Society 41: The Panels.

Chikara Hashimoto, Francis Bond, Takaaki Tanaka, and Melanie Siegel.
2005. Integration of a lexical type database with a linguistically
interpreted corpus. In 6th International Workshop on Linguistically
Integrated Corpora (LINC-2005), pages 31--40. Cheju, Korea.

Oepen, Stephan, Kristina Toutanova, Stuart Shieber, Christopher
Manning, Dan Flickinger, and Thorsten Brants (2002).   The LinGO
Redwoods Treebank: Motivation and Preliminary Applications. In
Proceedings of the 19th International Conference on Computational
Linguistics (COLING 2002), Taipei, Taiwan (pages 1253-1257).



On Sat, Mar 20, 2010 at 6:16 PM, maxwell <maxwell at umiacs.umd.edu> wrote:
> On Fri, 19 Mar 2010 23:35:51 +0100, Stefan Müller
> <Stefan.Mueller at fu-berlin.de> wrote:
>> As for porting: Dan's and my experience is that it is healthy to start
>> over from time to time.
>
> I have a hidden agenda, which I will now reveal :-).  I am concerned with
> language documentation, in which I (or some other linguist) will be the
> only linguist available to work on some language for the foreseeable
> future.  So I want something that will preserve my grammar for a long time,
> both as a descriptive (human readable) grammar and as a formal
> (computer-processable) grammar.
>
> If I were a new-fangled computational linguist, I would of course create a
> treebank of the language instead, and rely on machine learning to turn that
> into a parser.  But being a rather Olde computational linguist, I happen to
> like grammar writing...  And to be honest, most of my work is in
> morphology, where hand-crafted grammars of morphologically complex
> languages are still fairly common.  But I wanted to hear how syntacticians
> felt about this.
>
>> The [incr TSDB()] manual says on page 37 that Stephan will help with the
>> installation, provided there is a beach near your institution.
>
> I don't suppose he would be interested in working on Pashto... or Inuit,
> for that matter.
>
>> Independent of all this I think that developers should document their
>> grammars so that the knowledge and expertise is preserved for the
>> future. Then the respective grammars can be taken and be implemented in
>> whatever system is available then.
>
> I agree, and this takes me back to my agenda: blending descriptive and
> formal grammars into a single document.
>
>> But according to Bob
>> Carpenter, it took just three weeks to implement the grammar in the
>> appendix of PS94.
>
> Unfortunately, I'm not Bob Carpenter...
>
>> So if we have a precise description of seven years work with pencil and
>> paper or with the computer, we can reuse it and the achievement is there
>> independently of specific hardware.
>
> My experience is that few descriptive grammars are sufficiently precise
> and unambiguous for that.  Nor, if they haven't been tested
> computationally, are they that accurate.
>
>   Mike Maxwell
>
>



More information about the HPSG-L mailing list