[Lexicog] Digest Number 343
Ron Moe
ron_moe at SIL.ORG
Wed May 25 02:39:57 UTC 2005
If I understand the issue correctly, what is needed is the ability to sort
senses according to the contents of one or more fields. We also need the
ability to display the senses hierarchically. I'm sure a computer program
could do this. Like most things in lexicography, this issue is related to
(1) how a dictionary develops in practical terms, (2) how we model the
structure of the data, and (3) how the program manipulates the data to
produce the desired view.
Concerning (1), the user gradually accumulates senses over time. He is most
likely to encounter the primary sense first, but this is not always the
case. As soon as he finds a second sense for an entry, he is confronted with
the question of how to order them. The normal practice is to make a judgment
as to which sense is more "basic." As his understanding of the language
progresses, he may realize that many entries need to be structured along
similar lines. For instance Allan's Philippine language needs to order the
senses of verbs by focus: (a) subject focus, (b) object focus, (c)
instrument focus, (d) location focus. He therefore has to specify the verb
focus in the database. Once he has done this, the program could
automatically order senses based on this field.
Concerning (2), I don't know of any program in which the senses of an entry
are unordered. In each case the model assumes that senses are ordered. When
the user adds a sense, the program adds a *second* sense, then a *third*
sense, and so on. Therefore the program must allow the user to specify the
placement of the new sense, and it must allow the user to reorder the senses
at a later date. This ordering is a user defined order based on a judgment
of historical primacy or semantic derivation. It cannot be automated.
Some programs, like MDF, assume two layers to the ordering: (a) a user
specified (semantic) order, (b) an order based on part of speech. Either
factor can be ranked first.
Concerning (3), as long as the user has ordered all the senses, a program
could change the relative ranking. The program could first group all the
nouns and then number the senses in the order that the user specified; then
group the verbs and so on. Alternatively it could first group the senses
that have been assigned to the same number. But this only works if the user
assigns the same number to both a noun and a verb if they both belong to the
same 'sense'. So in Allan's example both "hugal n. The activity of playing
cards, gambling with cards" and "munhugal v. Two or more will play cards,
gamble with playing cards" have to be assigned to sense #2. Otherwise the
system doesn't work.
But Allan needs more than this. We've noted that he needs to be able to
specify a relative order, and he needs to be able to order senses by part of
speech. But he *also* needs to order senses by verb focus. The program must
therefore have the capability of ordering senses based on more than one
feature. To achieve the order in Allan's first example, the program must
first order the senses by the user specified number, then by part of speech,
and then by verb focus. To achieve the order in his second example, the
program must first order the senses by part of speech, then verb focus, and
then user specified number. If the user has the ability to assign senses to
sense groups, then the senses can be unordered in the model. The order is a
function of the print routine.
In order to produce a desired view, all the user needs to do is tell the
program what fields to sort the senses by, and what the ranking of those
fields is. So the database must look like this (\pd=paradigm \sn=sense
number \vf=verb focus):
\lx hugal
\sn 1
\ps n
\de A playing card; a deck of playing cards
\sn 2
\ps n
\de The activity of playing cards, gambling with cards
\pd munhugal
\sn 2
\ps v
\vf AF
\de Two or more will play cards, gamble with playing cards
\pd humugal
\sn 3
\ps v
\vf OF
\de Someone will win an opponent's stake in gambling with playing cards
\pd hugálon
\sn 3
\ps v
\vf LF
\de An opponent's stake will be won by someone in gambling with playing
cards
\pd munhugal
\sn 4
\ps v
\vf AF
\de Someone will lose money, an article of value, in gambling with playing
cards
\pd ihugal
\sn 4
\ps v
\vf IF
\de Money, an article of value, will be lost by someone in gambling with
playing cards
If the database contains this data, then it doesn't matter what order the
senses are in (i.e. what the actual order is in the database). (I believe)
we can write a computer program that would output the data in either of
Allan's formats. The program can order the senses, output the fields in the
correct order, and renumber the "sense numbers" appropriately. We could tell
the program to sort the senses on the basis of the \pd \ps and/or \vf
fields, using any ranking of these fields. The program would then "number"
any non-unique combination of the fields that we specified on the basis of
the order in the \sn field.
This would all be highly speculative and theoretical, except that we have
the opportunity to request that the new FieldWorks program work this way.
Ron Moe
-----Original Message-----
From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com]On Behalf Of Allan Johnson
Sent: Tuesday, May 24, 2005 9:46 AM
To: lexicographylist at yahoogroups.com
Subject: Re: [Lexicog] Digest Number 343
> However, I'm not certain I answered your original question, in which you
> also talked about having multiple wordforms in the hierarchy. I'm not
> sure if these wordforms are irregular forms, or what. Putting my
> uncertainty more concretely, why do you want to have wordforms inside a
> lex entry, if the citation form is itself a wordform? Is it because you
> want to include irregular forms, which will be relevant to one POS but
> not another? Maybe an example would help me understand...
I found an example that might be helpful. The attached file shows an entry
in a sense-primary PLB format, followed by the same information in a
form-primary MDF format. Both forms use wordforms inside a lexical entry.
The lexeme is a root (which often will also be a wordform as in this case,
but not always), and its associated wordforms are various inflections or
derivations of this root.
The sense-primary format allows wordforms to be grouped according to a
semantic feature that they share, which in the case of senses 3 & 4 appears
to be the idea of winning vs. losing. The form-primary format is showing
the wordforms in a grammatically relevant order. For some uses of the
dictionary it might be more helpful to simply put these wordforms in an
alphabetical order.
The first format is supported by the PLB dictionary standard, although I
don't know if our dictionaries have often been actually published with full
wordforms listed in this way. To save space, I think often just affixes
have been shown rather than full word forms, saving the full word forms
just for irregular cases. Personally, I like a dictionary that shows me
full word forms like this. It seems less abstract; more accessible. There
is that problem of taking up too many pages in a printed version of the
dictionary. But I think the problem of size should be a lot less relevant
when it comes to electronically searchable dictionaries intended for
internet access.
Allan J.
----- Original Message -----
From: "Mike Maxwell" <maxwell at ldc.upenn.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Monday, May 23, 2005 9:49 PM
Subject: Re: [Lexicog] Digest Number 343
> Allan Johnson wrote:
> > Yes, I think that's what I'm saying - not like a thesaurus, but
arranged
> > semantically just within each main entry - by meaning rather than by
form.
> > ...
> > And a similar question - could a database be designed that would allow
both
> > a Standard MDF view and a PLB view (or Alternate MDF view) of the same
> > data?
>
> A rather belated reply--I think Mike Sangay covered the database theory
> behind it.
>
> When last I looked, the FieldWorks lex db structure will support
> multiple way of viewing lexemes. Specifically, lexemes are treated as
> having a bunch of POSs (actually, the structure is considerably richer
> than just POS) and a bunch of senses. Each sub-part of a lex entry
> would then be a combination of one of those POSs with one of those
> senses. That allows you to choose whichever way of grouping the
> sub-parts you prefer: either group first by POS, and secondarily by
sense:
>
> foobar 1. (noun) (a) left-handed monkey wrench. (b) shore line.
> 2. (verb) (a) to go on a wild goose chase. (b) to send someone
> to look for a left-handed monkey wrench.
>
> or vice versa.
>
> However, I'm not certain I answered your original question, in which you
> also talked about having multiple wordforms in the hierarchy. I'm not
> sure if these wordforms are irregular forms, or what. Putting my
> uncertainty more concretely, why do you want to have wordforms inside a
> lex entry, if the citation form is itself a wordform? Is it because you
> want to include irregular forms, which will be relevant to one POS but
> not another? Maybe an example would help me understand...
>
> --
> Mike Maxwell
> Linguistic Data Consortium
> maxwell at ldc.upenn.edu
>
Yahoo! Groups Links
<br>
----WARNING----WARNING----WARNING----<br>
Zip Files can contain harmful viruses<br>
Do NOT open the attached zip file<br>
unless you know the sender AND are expecting a zip file!<br>
Contact jarmail_admin at sil.org if<br>
you have any questions<br>
--------------------------------------<br>
<br>
----WARNING----WARNING----WARNING----<br>
Zip Files can contain harmful viruses<br>
Do NOT open the attached zip file<br>
unless you know the sender AND are expecting a zip file!<br>
Contact jarmail_admin at sil.org if<br>
you have any questions<br>
--------------------------------------<br>
------------------------ Yahoo! Groups Sponsor --------------------~-->
Has someone you know been affected by illness or disease?
Network for Good is THE place to support health awareness efforts!
http://us.click.yahoo.com/RzSHvD/UOnJAA/79vVAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list