Portability

Tue Mar 23 02:50:37 UTC 2010

Emily M. Bender wrote:
> Glad to hear it.  Is the Stuttgart FST system open-source?   

Yes:
http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html

> ...What kind of rule formalism does it support?  (Anything like XFST?

It supports many of the same kinds of rules, including Karttunen's 
"replace" rules (phonological rules, basically, as opposed to 
"two-level" rules).  The formalism looks a bit different.  I was very 
familiar with xfst (and lexc), but it didn't take me long to get up to 
speed on SFST.  One thing SFST does not have is the compile-replace 
algorithm that allows XFST to do full word reduplication.  Xerox owns a 
patent on the compile-replace algorithm.  SFST also does not have an 
interpreter/ debugger like XFST has.  And the implementation of 
epenthesis rules in SFST is, IMO, a bit clunky.  (To be fair, epenthesis 
in the general case is difficult to program right.)

Mans Hulden at U Arizona has written an open source FST tool called 
'Foma', which he says "is largely compatible with the Xerox/PARC 
finite-state toolkit."  I unfortunately haven't had a chance to check it 
out yet.  It's described here:
    http://www.aclweb.org/anthology/E/E09/E09-2008.pdf

> ...We're 
> starting to think about making the Grammar Matrix customization
> system facilitate the development of morphophonological analyzers
> linked to the HPSG grammars, and are looking for a good open-source
> FST system to work with.)

I would recommend either SFST or Mans Hulden's version of xfst (but with 
the proviso that I've only actually used SFST).

There's also OpenFST, which is sort of an open source re-implementation 
of AT&T's FSM Library.  OpenFST provides weights, which you're probably 
not interested in. (Weights can of course be important for some things, 
like spell correction.  SFST has a *trainable* weight system, but it's 
not possible to reach in and fiddle with the weights by hand.)  On the 
down side, OpenFST is a pretty low level interface.  Ken Beesley has 
been working part-time on a higher level language to be used with 
OpenFST; I doubt that that will be coming out soon.

There are numerous other FST implementations, but none (afaik) that are 
as easy to use for linguists as SFST (and of course xfst).

>> That also would not be a problem; it's easy to use Literate
>> Programming to take grammar fragments scattered in any order
>> throughout the descriptive grammar and map them into whatever
>> format and order the target environment requires.  We currently do
>> a much more radical thing than that for our morphological parser,
>> as the morphology is written in a generic XML scheme, which is
>> translated into the SFST programming language after being mapped 
>> into the necessary order.
> 
> I would be very interested to see what this looks like with a
> well-commented HPSG grammar.  (And to learn what kind of best
> practices it would require me to adopt in writing my---already
> prolific---comments.)  

I don't have any well-commented HPSG grammars, only (thus far) 
morphological grammars.  Literate Programming was of course invented by 
Donald Knuth, whom you may know :-).  It was intended for making 
computer programs more readable by humans; I don't suppose the good Dr. 
Knuth had grammars in mind, but it's actually a better fit to grammars.

The basic idea is that instead of embedding comments into a program, you 
embed the program in pieces inside a human-readable (prose) description. 
  What that has meant for us is that we write a descriptive grammar (of 
morphology and phonology, mostly), and embed our formal grammar 
fragments into that where ever they make sense from an expository point 
of view.  That has allowed us to split the work in two, with the 
descriptive grammar writing being purely a linguistic task, and the 
formal grammar writing and debugging being more of a computational 
linguistic task.  No reason the two couldn't be done by a single person, 
but splitting between two people (or two teams) has worked well for us, 
with a lot of back-and-forth.

Typically a formal grammar fragment contains a single rule, or a set of 
affixes that it makes sense to describe together.  Each grammar fragment 
is tagged as an xml element, and the fragment element is given a name. 
In an appendix, there is what amounts to a table of those names, in the 
order needed to reconstitute the complete formal grammar for "compilation."

We used the Literate Programming system for DocBook that Norm Walsh 
describes here:
    http://nwalsh.com/docs/articles/xml2002/lp/paper.pdf

> ...We are also starting to look into ontological
> annotation of grammars (with GOLD), which might connect here.

Yes, that's on my to-do list.  Specifically, we have grammar fragments 
that define the morphosyntactic features (the noun features are 
contained in a fragment in the chapter on noun morphology, the verb 
features in a fragment in the chapter on verb morphology; any additional 
features needed by adjectives can of course be defined in a fragment in 
the adjective chapter).  At the moment, those features are described in 
the descriptive grammar, in the sense of giving their uses ("The direct 
case in Pashto is used for the subject of present and future tense 
clauses...").  But it would be a trivial modification of my xml schema 
for feature definitions to add a pointer to the definitions in GOLD, and 
I intend to do that.
-- 
    Mike Maxwell
    What good is a universe without somebody around to look at it?
    --Robert Dicke, Princeton physicist