Portability
Mike Maxwell
maxwell at umiacs.umd.edu
Tue Mar 23 02:50:37 UTC 2010
Emily M. Bender wrote:
> Glad to hear it. Is the Stuttgart FST system open-source?
Yes:
http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html
> ...What kind of rule formalism does it support? (Anything like XFST?
It supports many of the same kinds of rules, including Karttunen's
"replace" rules (phonological rules, basically, as opposed to
"two-level" rules). The formalism looks a bit different. I was very
familiar with xfst (and lexc), but it didn't take me long to get up to
speed on SFST. One thing SFST does not have is the compile-replace
algorithm that allows XFST to do full word reduplication. Xerox owns a
patent on the compile-replace algorithm. SFST also does not have an
interpreter/ debugger like XFST has. And the implementation of
epenthesis rules in SFST is, IMO, a bit clunky. (To be fair, epenthesis
in the general case is difficult to program right.)
Mans Hulden at U Arizona has written an open source FST tool called
'Foma', which he says "is largely compatible with the Xerox/PARC
finite-state toolkit." I unfortunately haven't had a chance to check it
out yet. It's described here:
http://www.aclweb.org/anthology/E/E09/E09-2008.pdf
> ...We're
> starting to think about making the Grammar Matrix customization
> system facilitate the development of morphophonological analyzers
> linked to the HPSG grammars, and are looking for a good open-source
> FST system to work with.)
I would recommend either SFST or Mans Hulden's version of xfst (but with
the proviso that I've only actually used SFST).
There's also OpenFST, which is sort of an open source re-implementation
of AT&T's FSM Library. OpenFST provides weights, which you're probably
not interested in. (Weights can of course be important for some things,
like spell correction. SFST has a *trainable* weight system, but it's
not possible to reach in and fiddle with the weights by hand.) On the
down side, OpenFST is a pretty low level interface. Ken Beesley has
been working part-time on a higher level language to be used with
OpenFST; I doubt that that will be coming out soon.
There are numerous other FST implementations, but none (afaik) that are
as easy to use for linguists as SFST (and of course xfst).
>> That also would not be a problem; it's easy to use Literate
>> Programming to take grammar fragments scattered in any order
>> throughout the descriptive grammar and map them into whatever
>> format and order the target environment requires. We currently do
>> a much more radical thing than that for our morphological parser,
>> as the morphology is written in a generic XML scheme, which is
>> translated into the SFST programming language after being mapped
>> into the necessary order.
>
> I would be very interested to see what this looks like with a
> well-commented HPSG grammar. (And to learn what kind of best
> practices it would require me to adopt in writing my---already
> prolific---comments.)
I don't have any well-commented HPSG grammars, only (thus far)
morphological grammars. Literate Programming was of course invented by
Donald Knuth, whom you may know :-). It was intended for making
computer programs more readable by humans; I don't suppose the good Dr.
Knuth had grammars in mind, but it's actually a better fit to grammars.
The basic idea is that instead of embedding comments into a program, you
embed the program in pieces inside a human-readable (prose) description.
What that has meant for us is that we write a descriptive grammar (of
morphology and phonology, mostly), and embed our formal grammar
fragments into that where ever they make sense from an expository point
of view. That has allowed us to split the work in two, with the
descriptive grammar writing being purely a linguistic task, and the
formal grammar writing and debugging being more of a computational
linguistic task. No reason the two couldn't be done by a single person,
but splitting between two people (or two teams) has worked well for us,
with a lot of back-and-forth.
Typically a formal grammar fragment contains a single rule, or a set of
affixes that it makes sense to describe together. Each grammar fragment
is tagged as an xml element, and the fragment element is given a name.
In an appendix, there is what amounts to a table of those names, in the
order needed to reconstitute the complete formal grammar for "compilation."
We used the Literate Programming system for DocBook that Norm Walsh
describes here:
http://nwalsh.com/docs/articles/xml2002/lp/paper.pdf
> ...We are also starting to look into ontological
> annotation of grammars (with GOLD), which might connect here.
Yes, that's on my to-do list. Specifically, we have grammar fragments
that define the morphosyntactic features (the noun features are
contained in a fragment in the chapter on noun morphology, the verb
features in a fragment in the chapter on verb morphology; any additional
features needed by adjectives can of course be defined in a fragment in
the adjective chapter). At the moment, those features are described in
the descriptive grammar, in the sense of giving their uses ("The direct
case in Pashto is used for the subject of present and future tense
clauses..."). But it would be a trivial modification of my xml schema
for feature definitions to add a pointer to the definitions in GOLD, and
I intend to do that.
--
Mike Maxwell
What good is a universe without somebody around to look at it?
--Robert Dicke, Princeton physicist
More information about the HPSG-L
mailing list