[Lexicog] how to compile data on proto-language reconstruction
Allan Johnson
allan_johnson at SIL.ORG
Mon Mar 26 08:21:30 UTC 2007
Ok, I did some testing on these suggestions. I ran into a problem with the
way MDF handles glosses. It allows only one gloss in a subentry. If more
than one gloss per subentry appears in the data, Lexique Pro concatenates
them in the display.
The way this proto-language data is structured, it needs to be able to use
the gloss as a level of the hierarchy, so that under one "\se attested form"
there can be more than one "\ge English gloss", each with its own list of
source languages. But MDF/Lexique Pro doesn't allow this. Here's a sample
entry:
\lx *abay
\ge to travel with
\se abay
\ge wedding attendant
\xv AB
\ge to travel together at sea, in separate boats
\xv NS.1
\xv CS.2-3
\xv NB.1
\se abayan
\ge fleet
\xv CS.2
What MDF/Lexique Pro does with this where there is more than one gloss in a
subentry, is collect together the glosses and display them together like
this:
*abay
to travel with
abay
wedding attendant; to travel together at sea, in separate boats
AB
NS.1
CS.2-3
NB.1
abayan
fleet
CS.2
That's not good. Some information has been lost. What I would need it to do
is to let each gloss remain associated with its own list of source language
abbreviations, like this:
*abay
to travel with
abay
wedding attendant
AB
to travel together at sea, in separate boats
NS.1
CS.2-3
NB.1
abayan
fleet
CS.2
So it seems that I do need to put the glosses somewhere else other than in
the \ge fields. They need to be in a field that is treated as a level of the
hierarchy. For now, I do have a way to handle it that seems to display OK in
Lexique Pro. If I represent it as follows:
\lx *abay
\ps to travel with
\se abay
\ps wedding attendant
\ge AB
\ps to travel together at sea, in separate boats
\ge NS.1
\ge CS.2-3
\ge NB.1
\se abayan
\ps fleet
\ge CS.2
Then it displays like this:
*abay
to travel with
abay
wedding attendant
AB
to travel together at sea, in separate boats
NS.1; CS.2-3; NB.1
abayan
fleet
CS.2
But as I mentioned in my previous note, this use of \ps feels like a bit of
an abuse of the MDF standard.
Yes, it would be good to have a standardized way to handle this kind of
data. I guess the place to start would be to collect together various
examples of such data. Are there other examples? I have no idea whether what
I'm looking at is typical for proto-language data. I guess it's a rather
rare type of data compared to MDF-type dictionaries.
It might not be too difficult to tweak the MDF CC tables to tell it not to
concatenate glosses, and to incorporate whatever other tweaks would be
needed. But it would also be necessary to have similar adjustments made in
Lexique Pro. I wouldn't want to mess with this unless we really had
something that a number of people could make good use of.
Allan
_____
From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com] On Behalf Of Ron Moe
Sent: Friday, March 23, 2007 2:30 AM
To: lexicographylist at yahoogroups.com
Subject: RE: [Lexicog] how to compile data on proto-language reconstruction
Allan,
It looks like the structure of your comparative dictionary is nearly the
same as the MDF entry-subentry format (with the addition of language names):
\lx Proto-form
\ge English gloss
\se Attested form
\ge English gloss
\ex Language name (any vernacular field would do)
It would be nice if we had a standardized format for comparative
dictionaries similar to MDF. It wouldn't take much to produce one by
modified MDF.
Ron Moe
_____
From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com] On Behalf Of Allan Johnson
Sent: Thursday, March 22, 2007 1:32 AM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] how to compile data on proto-language reconstruction
Hi all,
I'm trying to help a colleague get a collection of Proto language data into
a format that could be posted on a website. We have it in a Toolbox format
and I'd like to export it to HTML from Lexique Pro. The format is a lot like
a dictionary but enough different that I'm running into trouble. A typical
entry goes something like this:
Proto-form
English gloss
Attested form
English gloss
Language name (abbreviation for a language/dialect area which
attests this form and meaning)
Language name
Language name
English gloss
Language name
Language name
Language name
Attested form
English gloss
Language name
Language name
Language name
The only way I've found to make this work with Toolbox / Lexique Pro is to
put the English gloss in the \ps field and the language abbreviations in the
\ge field. But this feels like a major abuse of the system. Do any of you
know of a format that's really meant for this kind of data?
Allan J.
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.16/729 - Release Date: 3/21/2007
7:52 AM
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.16/729 - Release Date: 3/21/2007
7:52 AM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20070326/6729608d/attachment.htm>
More information about the Lexicography
mailing list