Linguistics as science and as academic discipline

Mark P. Line mark at polymathix.com
Wed Oct 25 02:19:37 UTC 2006


Thanks, Spike. There are a number of remarks I might make (and may make
later in the ensuing thread), but I've picked out just a couple of your
more technical points to comment on here.

Spike Gildea wrote:
>
> As more
> individuals in the field try to engage with this new technology, there
> will be fewer unknowns, but while people with a greater affinity for
> technology jump in and experiment and debate about formats and methods,
> those of us who just want to be end users, who just want good tools and
> a straightforward model to follow, are concerned about entering into a
> massive project with no assurance that five years from now, we won't
> look back and see five years essentially wasted putting data into a
> format that didn't become generally accepted.

There are a couple of deep, dark secrets that computer experts (and OEM's)
jealously guard against merely mortal end users, and I'm going to tell you
what they are. (I've left the employ of those keepers of arcane secrets,
so I have nothing to lose anymore...)

Here they are.

Wait for them.......

Okay.

      1. If you can capture your data consistently in *any* documented
         digital format, then I can transform your data consistently into
         any other documented digital format. If both formats are XML
         vocabularies, I'll get one of my cats to do it. If anybody tells
         you differently, then they're after more of your grant money than
         they deserve.

      2. Given a set of requirements, I can invent and document a new
         digital format overnight that meets those requirements.

So, the task of the fieldworker is NOT to second-guess the evolution of
technology by trying to put her data into *the* format that somebody has
convinced her is the one that is going to "win".

The task of the fieldworker is to put her data consistently into any
documented digital format of her choosing. (She'll be wise to choose one
that allows her to capture every salient feature of her data, or hire
somebody to invent one that does.)

People who like to fuss with the technical issues can then take it from
there.

You've written books and gotten them published. Did you operate the
printing press yourself? Did you worry a lot about whether or not the
printing press might run out of ink while your book was running?


> So how can we effect some change, help the system evolve so as to make
> it easier (and more rewarding) to do fieldwork in the more reliable ways
> that current technology makes possible.  To start with, I'd like to see
> a push for an academic culture that acknowledges the value (even the
> necessity) of CDs with sound files to accompany printed language data --
> under that standard, I might not be able to publish anything for a few
> years, but I believe the reliability of the database available to
> typologists and theoreticians would increase sharply.

Publishing on CD's is much inferior to publishing online, for quite a
large number of reasons. I can think of a few reasons right off the top of
my head, and those here who are in the thick of online archive management
can surely add many more:

-- corrections can be made directly to an online dataset, which then
   become immediately available; CD's have to be remastered, reburned
   and re-snail-mailed

-- corrections and other changes to an online dataset can be managed under
   version control

-- users can be notified of changes to an online dataset if they wish

-- the population of permitted users of an online dataset can be easily
   restricted

-- CD's are platform-dependent or must support multiple platforms,
   while browser-based online access is functionally platform-independent

-- CD's cost money to produce; the marginal cost for online access to
   the amount of data that fits on a CD is negligible

-- not much data fits on a CD; online servers can handle enormous volumes
   of data



-- Mark

Mark P. Line
Polymathix
San Antonio, TX



More information about the Funknet mailing list