Meta-question about FLEx: interoperability

Fri Oct 14 02:42:25 UTC 2011

Part 2

Claire also mentioned the interoperability of FLEx, and I'd like to talk a
bit about that. And I'll watch my pronouns this time.

The lack of interoperability has been my largest concern since I left the
FLEx team 6 years ago. Before we can talk usefully about interoperability,
though, I think we need to step back and find out how FLEx ended up less
interoperable than it might have been.  In my view, the largest factor is
that we followed our customers blindly.

If we look at the early posts in this thread, we see some agreement that it
would be nice to have a do-it-all linguistic application.  This fits
squarely with what the FLEx team hears constantly from its users.  Ask a
group of users  "would you like our app to also do X, or would you rather
use a program designed especially to do X really well?", and in my
experience the vast majority will say "build X in, please. We don't want to
learn another program".  And so FLEx set out to get all the our data in one
pot, all the tools in one place. This appeared to be common sense at the
time. After all, you could then have all the data nicely linked together,
data could be normalized so that everything was always consistent and
representative of your very latest analysis, etc.

But there are many downsides to that approach. And now that FLEx has
succeeded in terms of "market share", these problems are even more clear.
They include:

* It is difficult to give each area the attention it deserves when you try
to do many things in one free product, developed by volunteers, and can't
add development staff as the product grows.  So some areas remain shallow.
Enough to tantalize, as others have indicated, but not enough to satisfy
many.

* The code itself can become so large and overwhelming that your open-source
efforts don't pay off; no one has the time to get their head around enough
of the system in order to contribute.

* It is difficult for others, more passionate or knowledgeable about an
area, to contribute a tool which does go deep in some area.

* When others do manage to create complementary external tools, the tight
integration of data makes it difficult to interoperate with the external
data.  The models never quite match, the notion of ensuring data consistency
is challenged, etc.

Borrowing metaphors from a well-known essay on the Open Source paradigm, I
began thinking of FLEx as a Cathedral, when I wanted to be shopping among
stalls in a Bazaar.   We started talking internally about the downsides of
our customer-driven cathedral tendencies, recognizing the benefits of a
bazaar eco-system of interoperable linguistic software.  Though their
customers surely would not agree, many of my colleagues/superiors came to
share this perspective, and now lend their support to opening up FLEx while
stopping its expansion into new territory.  Unfortunately, we now are faced
with whole new sets of engineering problems that few users would understand,
and which I won't bring up here.  If any of you are into ontologies, we're
looking at the promise of RDF, OWL, etc. as an alternative to traditional
xml models in order to facilitate interoperability.

Ok, so some of us have come to think that our customer is actually
best-served in a healthy ecosystem of interoperable choices, and we can
point to some movement away from the kitchen-sink approach. FLEx was too
complicated for non-linguist native speakers, so a couple of us created
WeSay (which interoperates with FLEx using LIFT xml).  FLEx lacked
phonological analysis, and happily two of my colleagues instead built a new
Phonology Assistant, which interoperates with FLEx & Toolbox.   Our target
users needed a different approach to language documentation software, but we
started SayMore instead of adding features to FLEx. There are a couple of us
arguing that syntax and discourse analysis should be spun out of FLEx into
their own programs, where they can grow with more individual love and
attention.

Now, so long as the only things that SIL apps interoperate with are other
SIL programs, we don't really have a bazaar. A tiny shopping mall, maybe.  

Why is this important? For one thing, when FLEx's model doesn't fit the
needs of the language under study, then those researchers are stuck; it's
all or nothing. Maybe they go back to the trailer park, maybe they lose an
important distinction and their analysis suffers. That's one of the reasons
why, as we added annotation features to SayMore, we went with ELAN's
existing file format rather than enhancing FLEx's nascent xml interlinear
format.  We know that we don't want to provide ELAN's power or flexibility.
So by using the same file format, it's easy to organize your media,
metadata, & informed consent with SayMore, even do annotation there, but if
you need more power, great, a double click opens the annotation file in
ELAN.  But we were lucky. it's not often that we find that a format designed
for one program that suits the needs of another.

Another reason is that there's not enough programmers in this space to do
several kitchen-sink applications, targeting different personas.  As a
software designer who spends half my time designing for subsistence farmers
and the other for researchers, let me say that it is rather difficult to
make great tools if you target too wide a range of users, particularly with
limited resources.  SIL's developers are volunteers who make a fraction of
market wages and have to raise all of that money themselves, even giving 10%
of our income for administrative overhead.  It should be no surprise, then,
that there are few of us, and we don't have the resources to serve the needs
of every kind of user.  A healthy software ecosystem, then, must have SIL as
just one source among many.  Where our software raises the "cost of exit" by
making it hard to move your data elsewhere, it is harming that ecosystem.
Where it promotes existing open file formats (ELAN), or promotes new ones
(LIFT) it's helping that ecosystem.

One last note, raised by Christopher Cox
<http://scholarspace.manoa.hawaii.edu/handle/10125/5239>  at the LD&C
conference in February: it can be hard to fund software maintenance
long-term with normal research funding models.  To encourage diversity, I
would like to see others, without SIL's odd funding system, have a way to be
paid long term to develop, maintain, and support linguistic software. I
would like to see linguistic research proposals include funds to give to the
developers of the tools used, to fund further development. 

John Hatton

SIL International Language Software Development, PALASO <http://palaso.org/>
, and SIL Papua New Guinea <http://pnglanguages.org/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20111013/957696d9/attachment.html>