[Lexicog] Re: Kirrkirr and Shoebox/Toolbox

Vanessa Combet combet at SINEQUA.COM
Fri Aug 13 10:19:18 UTC 2004


Hello everyone,
not only does "shallow thinking" provoke discussion, it may also generate
digression... I would like to point out that several groups of experts are
working right now on the project of elaborating standards for the
representation of lexical resources - cf. http://www.tc37sc4.org/ : "The
objective of ISO/TC 37/SC 4 is to prepare various standards by specifying
principles and methods for creating, coding, processing and managing
language resources, such as written corpora, lexical corpora, speech
corpora, dictionary compiling and classification schemes."
In particular there's a proposition concerning a "Data Category Registry"
(DCR) for language resources which is supposed to offer the possibility of
representing every POS possible for every language - all this being
"compatible" XML, OWL, OLIF, etc.
Hope it's not to much of a digression !-)
Vanessa Combet
http://www.sinequa.com


-----Message d'origine-----
De : Kenneth Keyes [mailto:ken_keyes at sil.org]
Envoyé : vendredi 13 août 2004 09:17
À : lexicographylist at yahoogroups.com
Objet : RE: [Lexicog] Re: Kirrkirr and Shoebox/Toolbox


Thanks, Mike, for setting me straight.

I was just "shooting off the hip" and didn't give OLIF as much thought as
you did. But, I suppose that the advantage of such shallow thinking is that
it provokes discussion ;-). OLIF does strike me as being rather Eurocentric
in its orientation, and I would propose that an XML standard for
lexicography and linguistic analysis include all the grammatical categories
possible, such as are found in Thomas Payne's Describing Morphosyntax: A
Guide for Field Linguists, for example.

I do agree that a relational database delivers faster performance than an
XML file. However, I have been very impressed with the work of Libronix
software in the publication of Brill's Hebrew and Aramaic Lexicon of the Old
Testament, and other lexicons, which is based on XML. It is slow, but
powerful.

Speaking of interlinearization of texts, by the way, Lars Huttar wrote an
excellent thesis and data model based on XML and XLT for discourse analysis
of texts using Levinsohn-Longacre charting. I don't have the reference.
You'd have to "google" it.

Ken


-----Original Message-----
From: Mike Maxwell [mailto:maxwell at ldc.upenn.edu]
Sent: Thursday, August 12, 2004 9:51 PM
To: lexicographylist at yahoogroups.com
Subject: Re: [Lexicog] Re: Kirrkirr and Shoebox/Toolbox

Kenneth Keyes wrote:
> OLIF is a proposed standard XML standard for the formatting of lexicons.
>>>From what I can infer It is one of the formats that SIL may be considering
> for its next "Fieldworks" module. See http://www.olif.net/ for details on
> OLIF.

I was involved in the SIL Fieldworks project up until a couple years ago
(I still try to keep up, but...)  From what I can tell, OLIF is best
described as an XML _exchange_ format.

The lexicon in FieldWorks was (last I heard) to be maintained in a
relational database, which was being used to store what was essentially
an object oriented database.  There was to be provision for exporting
from this database to XML format.  The reason (last I heard--OK, I'm
going to stop saying that, but you should read it between the lines :-))
for storing it as a db, rather than as a raw XML file, was performance.

But of course the XML _format_ implies a database structure, and there
could well be a mapping between the FW structure and the OLIF format.
(More on this below.)

I would like to hear more about OLIF.  At a quick glance, it appears
very Euro-centric, by which I mean for example that the "Fixed Values"
for part-of-speech etc. are designed for European languages
(particularly modern Indo-European languages): possible tenses are past,
present and future; grammatical genders are masculine, feminine, neuter,
common, and unspecified; the list of cases would not be adequate  for
Latin even (no ablative or vocative); etc.

Obviously this Euro-centricity could be remedied.  One of the goals in
FieldWorks is to come up with "lists" of such fixed values which are
more comprehensive.  (I put "lists" in scare quotes because such a list
may be hierarchical--one would like, for example, to have "distant past"
and "recent past"--or some such--as subcategories under "past".)  Such
an expanded list could easily be incorporated into OLIF.

A more difficult problem would be if there were not a mapping between
the FieldWorks lexicon structure and the OLIF format.  Ideally, this
mapping would be bi-directional.  But even if it were only one
directional--from FW to OLIF--that would still be valuable.  (There
might be things that the OLIF format incorporates which would be
irrelevant to field lexicons--maybe information that was relevant for
statistical MT, or s.t.)

As I say, I have not had time to study the OLIF spec, so any remarks
about whether FW would be mappable into OLIF are necessarily
preliminary.   Having said that, I will say that one obvious difference
is that FW incorporates much more than a lexicon--e.g. it has provision
for handling interlinear text.  There is (understandably) none of that
in OLIF, so far as I can tell.  Within the lexicon, that impacts on the
notion of example sentences.  The OLIF definition of an 'Example' is

    Sample text or portion of text in which entry string occurs
    Value:  string

No provision for pointing back to the text from whence the example came
(if it was not a made-up example), indeed no provision for interlinear
examples (which might be viewable in a bilingual dictionary as an
example in the target language + a translation in the glossing
language(s)), etc.  As I say, this is quite understandable, given the
goals of OLIF, so I'm not criticizing it--just saying that it was
developed with a particular goal in mind, which is not field
linguistics.  Having said that, of course one might export a lexicon
from the FW model into OLIF, but you might not easily send it back in
the other direction.

        Mike Maxwell




Yahoo! Groups Links








Yahoo! Groups Links








------------------------ Yahoo! Groups Sponsor --------------------~-->
Yahoo! Domains - Claim yours for only $14.70
http://us.click.yahoo.com/Z1wmxD/DREIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list