[Lexicog] how to compile data on proto-language reconstruction

Mon Mar 26 08:21:30 UTC 2007

Ok, I did some testing on these suggestions. I ran into a problem with the
way MDF handles glosses. It allows only one gloss in a subentry. If more
than one gloss per subentry appears in the data, Lexique Pro concatenates
them in the display. 

The way this proto-language data is structured, it needs to be able to use
the gloss as a level of the hierarchy, so that under one "\se attested form"
there can be more than one "\ge English gloss", each with its own list of
source languages. But MDF/Lexique Pro doesn't allow this. Here's a sample
entry:

\lx *abay

    \ge to travel with

  \se abay

    \ge wedding attendant

      \xv AB

    \ge to travel together at sea, in separate boats

      \xv NS.1

      \xv CS.2-3

      \xv NB.1

  \se abayan

    \ge fleet

      \xv CS.2

What MDF/Lexique Pro does with this where there is more than one gloss in a
subentry, is collect together the glosses and display them together like
this:

*abay

    to travel with

  abay

    wedding attendant; to travel together at sea, in separate boats

      AB

      NS.1

      CS.2-3

      NB.1

  abayan

    fleet

      CS.2

That's not good. Some information has been lost. What I would need it to do
is to let each gloss remain associated with its own list of source language
abbreviations, like this:

*abay

    to travel with

  abay

    wedding attendant

      AB

    to travel together at sea, in separate boats

      NS.1

      CS.2-3

      NB.1

  abayan

    fleet

      CS.2

So it seems that I do need to put the glosses somewhere else other than in
the \ge fields. They need to be in a field that is treated as a level of the
hierarchy. For now, I do have a way to handle it that seems to display OK in
Lexique Pro. If I represent it as follows:

\lx *abay

    \ps to travel with

  \se abay

    \ps wedding attendant

      \ge AB

    \ps to travel together at sea, in separate boats

      \ge NS.1

      \ge CS.2-3

      \ge NB.1

  \se abayan

    \ps fleet

      \ge CS.2

Then it displays like this:

*abay

    to travel with

  abay

    wedding attendant

      AB

    to travel together at sea, in separate boats

      NS.1; CS.2-3; NB.1

  abayan

    fleet

      CS.2

But as I mentioned in my previous note, this use of \ps feels like a bit of
an abuse of the MDF standard.

Yes, it would be good to have a standardized way to handle this kind of
data. I guess the place to start would be to collect together various
examples of such data. Are there other examples? I have no idea whether what
I'm looking at is typical for proto-language data. I guess it's a rather
rare type of data compared to MDF-type dictionaries. 

It might not be too difficult to tweak the MDF CC tables to tell it not to
concatenate glosses, and to incorporate whatever other tweaks would be
needed. But it would also be necessary to have similar adjustments made in
Lexique Pro. I wouldn't want to mess with this unless we really had
something that a number of people could make good use of.

Allan

  _____  

From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com] On Behalf Of Ron Moe
Sent: Friday, March 23, 2007 2:30 AM
To: lexicographylist at yahoogroups.com
Subject: RE: [Lexicog] how to compile data on proto-language reconstruction

Allan,

It looks like the structure of your comparative dictionary is nearly the
same as the MDF entry-subentry format (with the addition of language names):

\lx Proto-form

\ge English gloss

            \se Attested form

            \ge English gloss

            \ex Language name (any vernacular field would do)

It would be nice if we had a standardized format for comparative
dictionaries similar to MDF. It wouldn't take much to produce one by
modified MDF.

Ron Moe

  _____  

From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com] On Behalf Of Allan Johnson
Sent: Thursday, March 22, 2007 1:32 AM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] how to compile data on proto-language reconstruction

Hi all,

I'm trying to help a colleague get a collection of Proto language data into
a format that could be posted on a website. We have it in a Toolbox format
and I'd like to export it to HTML from Lexique Pro. The format is a lot like
a dictionary but enough different that I'm running into trouble. A typical
entry goes something like this:

Proto-form

      English gloss

   Attested form

      English gloss

         Language name (abbreviation for a language/dialect area which
attests this form and meaning)

         Language name 

         Language name 

      English gloss

         Language name 

         Language name 

         Language name 

   Attested form

      English gloss

         Language name 

         Language name 

         Language name 

The only way I've found to make this work with Toolbox / Lexique Pro is to
put the English gloss in the \ps field and the language abbreviations in the
\ge field. But this feels like a major abuse of the system. Do any of you
know of a format that's really meant for this kind of data?

Allan J.

--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.16/729 - Release Date: 3/21/2007
7:52 AM

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.16/729 - Release Date: 3/21/2007
7:52 AM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20070326/6729608d/attachment.htm>