[Lexicog] Digest Number 343
Allan Johnson
allan_johnson at SIL.ORG
Wed May 25 14:56:04 UTC 2005
Hi Ron,
You've put some good thought into this. It does sound like the basis for
some good FieldWorks specifications. To summarize what I'm reading here,
you're suggesting that the ordering of the senses of a dictionary entry be
made flexible by allowing the user to specify a sort order to be used
within the entry. (I'm taking the word "sense" in this context to refer to
*one* gloss/definition and whatever other info goes along with it.)
The sorting could be done on any or all of the following fields:
- a user-defined semantic order
- the part of speech
- a grammatical category such as verb focus
- the wordform (alphabetically)
- any others... ?
For the part of speech and grammatical category, I would want to be able to
specify the order (such as "n, adj, v") rather than just ordering
alphabetically. The default order could still be alphabetical though, for
parts of speech whose place in the sort order hasn't yet been specified.
For the semantic ordering, it would be nice to have the option of using
meaningful labels (such as "win, lose") in place of sense numbers, to make
it easier for the compiler keep track of which sense is which. I believe
I've seen dictionary formats that actually print out such semantic labels,
as subheadings to make an entry easier to read. So it would good to have
the option of including these subheadings in the printed dictionary as
well. Just like part of speech and grammatical category, the order of
these semantic labels should be specifiable rather than just being
alphabetical.
Allan
----- Original Message -----
From: "Ron Moe" <ron_moe at sil.org>
To: <lexicographylist at yahoogroups.com>
Sent: Wednesday, May 25, 2005 10:39 AM
Subject: RE: [Lexicog] Digest Number 343
> If I understand the issue correctly, what is needed is the ability to
sort
> senses according to the contents of one or more fields. We also need the
> ability to display the senses hierarchically. I'm sure a computer program
> could do this. Like most things in lexicography, this issue is related to
> (1) how a dictionary develops in practical terms, (2) how we model the
> structure of the data, and (3) how the program manipulates the data to
> produce the desired view.
>
> Concerning (1), the user gradually accumulates senses over time. He is
most
> likely to encounter the primary sense first, but this is not always the
> case. As soon as he finds a second sense for an entry, he is confronted
with
> the question of how to order them. The normal practice is to make a
judgment
> as to which sense is more "basic." As his understanding of the language
> progresses, he may realize that many entries need to be structured along
> similar lines. For instance Allan's Philippine language needs to order
the
> senses of verbs by focus: (a) subject focus, (b) object focus, (c)
> instrument focus, (d) location focus. He therefore has to specify the
verb
> focus in the database. Once he has done this, the program could
> automatically order senses based on this field.
>
> Concerning (2), I don't know of any program in which the senses of an
entry
> are unordered. In each case the model assumes that senses are ordered.
When
> the user adds a sense, the program adds a *second* sense, then a *third*
> sense, and so on. Therefore the program must allow the user to specify
the
> placement of the new sense, and it must allow the user to reorder the
senses
> at a later date. This ordering is a user defined order based on a
judgment
> of historical primacy or semantic derivation. It cannot be automated.
>
> Some programs, like MDF, assume two layers to the ordering: (a) a user
> specified (semantic) order, (b) an order based on part of speech. Either
> factor can be ranked first.
>
> Concerning (3), as long as the user has ordered all the senses, a program
> could change the relative ranking. The program could first group all the
> nouns and then number the senses in the order that the user specified;
then
> group the verbs and so on. Alternatively it could first group the senses
> that have been assigned to the same number. But this only works if the
user
> assigns the same number to both a noun and a verb if they both belong to
the
> same 'sense'. So in Allan's example both "hugal n. The activity of
playing
> cards, gambling with cards" and "munhugal v. Two or more will play cards,
> gamble with playing cards" have to be assigned to sense #2. Otherwise the
> system doesn't work.
>
> But Allan needs more than this. We've noted that he needs to be able to
> specify a relative order, and he needs to be able to order senses by part
of
> speech. But he *also* needs to order senses by verb focus. The program
must
> therefore have the capability of ordering senses based on more than one
> feature. To achieve the order in Allan's first example, the program must
> first order the senses by the user specified number, then by part of
speech,
> and then by verb focus. To achieve the order in his second example, the
> program must first order the senses by part of speech, then verb focus,
and
> then user specified number. If the user has the ability to assign senses
to
> sense groups, then the senses can be unordered in the model. The order is
a
> function of the print routine.
>
> In order to produce a desired view, all the user needs to do is tell the
> program what fields to sort the senses by, and what the ranking of those
> fields is. So the database must look like this (\pd=paradigm \sn=sense
> number \vf=verb focus):
>
> \lx hugal
>
> \sn 1
> \ps n
> \de A playing card; a deck of playing cards
>
> \sn 2
> \ps n
> \de The activity of playing cards, gambling with cards
>
> \pd munhugal
> \sn 2
> \ps v
> \vf AF
> \de Two or more will play cards, gamble with playing cards
>
> \pd humugal
> \sn 3
> \ps v
> \vf OF
> \de Someone will win an opponent's stake in gambling with playing cards
>
> \pd hugálon
> \sn 3
> \ps v
> \vf LF
> \de An opponent's stake will be won by someone in gambling with playing
> cards
>
> \pd munhugal
> \sn 4
> \ps v
> \vf AF
> \de Someone will lose money, an article of value, in gambling with
playing
> cards
>
> \pd ihugal
> \sn 4
> \ps v
> \vf IF
> \de Money, an article of value, will be lost by someone in gambling with
> playing cards
>
> If the database contains this data, then it doesn't matter what order the
> senses are in (i.e. what the actual order is in the database). (I
believe)
> we can write a computer program that would output the data in either of
> Allan's formats. The program can order the senses, output the fields in
the
> correct order, and renumber the "sense numbers" appropriately. We could
tell
> the program to sort the senses on the basis of the \pd \ps and/or \vf
> fields, using any ranking of these fields. The program would then
"number"
> any non-unique combination of the fields that we specified on the basis
of
> the order in the \sn field.
>
> This would all be highly speculative and theoretical, except that we have
> the opportunity to request that the new FieldWorks program work this way.
>
> Ron Moe
>
>
> -----Original Message-----
> From: lexicographylist at yahoogroups.com
> [mailto:lexicographylist at yahoogroups.com]On Behalf Of Allan Johnson
> Sent: Tuesday, May 24, 2005 9:46 AM
> To: lexicographylist at yahoogroups.com
> Subject: Re: [Lexicog] Digest Number 343
>
>
> > However, I'm not certain I answered your original question, in which
you
> > also talked about having multiple wordforms in the hierarchy. I'm not
> > sure if these wordforms are irregular forms, or what. Putting my
> > uncertainty more concretely, why do you want to have wordforms inside a
> > lex entry, if the citation form is itself a wordform? Is it because
you
> > want to include irregular forms, which will be relevant to one POS but
> > not another? Maybe an example would help me understand...
>
> I found an example that might be helpful. The attached file shows an
entry
> in a sense-primary PLB format, followed by the same information in a
> form-primary MDF format. Both forms use wordforms inside a lexical
entry.
> The lexeme is a root (which often will also be a wordform as in this
case,
> but not always), and its associated wordforms are various inflections or
> derivations of this root.
>
> The sense-primary format allows wordforms to be grouped according to a
> semantic feature that they share, which in the case of senses 3 & 4
appears
> to be the idea of winning vs. losing. The form-primary format is showing
> the wordforms in a grammatically relevant order. For some uses of the
> dictionary it might be more helpful to simply put these wordforms in an
> alphabetical order.
>
> The first format is supported by the PLB dictionary standard, although I
> don't know if our dictionaries have often been actually published with
full
> wordforms listed in this way. To save space, I think often just affixes
> have been shown rather than full word forms, saving the full word forms
> just for irregular cases. Personally, I like a dictionary that shows me
> full word forms like this. It seems less abstract; more accessible.
There
> is that problem of taking up too many pages in a printed version of the
> dictionary. But I think the problem of size should be a lot less
relevant
> when it comes to electronically searchable dictionaries intended for
> internet access.
>
> Allan J.
>
>
> ----- Original Message -----
> From: "Mike Maxwell" <maxwell at ldc.upenn.edu>
> To: <lexicographylist at yahoogroups.com>
> Sent: Monday, May 23, 2005 9:49 PM
> Subject: Re: [Lexicog] Digest Number 343
>
>
> > Allan Johnson wrote:
> > > Yes, I think that's what I'm saying - not like a thesaurus, but
> arranged
> > > semantically just within each main entry - by meaning rather than by
> form.
> > > ...
> > > And a similar question - could a database be designed that would
allow
> both
> > > a Standard MDF view and a PLB view (or Alternate MDF view) of the
same
> > > data?
> >
> > A rather belated reply--I think Mike Sangay covered the database theory
> > behind it.
> >
> > When last I looked, the FieldWorks lex db structure will support
> > multiple way of viewing lexemes. Specifically, lexemes are treated as
> > having a bunch of POSs (actually, the structure is considerably richer
> > than just POS) and a bunch of senses. Each sub-part of a lex entry
> > would then be a combination of one of those POSs with one of those
> > senses. That allows you to choose whichever way of grouping the
> > sub-parts you prefer: either group first by POS, and secondarily by
> sense:
> >
> > foobar 1. (noun) (a) left-handed monkey wrench. (b) shore line.
> > 2. (verb) (a) to go on a wild goose chase. (b) to send someone
> > to look for a left-handed monkey wrench.
> >
> > or vice versa.
> >
> > However, I'm not certain I answered your original question, in which
you
> > also talked about having multiple wordforms in the hierarchy. I'm not
> > sure if these wordforms are irregular forms, or what. Putting my
> > uncertainty more concretely, why do you want to have wordforms inside a
> > lex entry, if the citation form is itself a wordform? Is it because
you
> > want to include irregular forms, which will be relevant to one POS but
> > not another? Maybe an example would help me understand...
> >
> > --
> > Mike Maxwell
> > Linguistic Data Consortium
> > maxwell at ldc.upenn.edu
> >
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
> <br>
> ----WARNING----WARNING----WARNING----<br>
> Zip Files can contain harmful viruses<br>
> Do NOT open the attached zip file<br>
> unless you know the sender AND are expecting a zip file!<br>
> Contact jarmail_admin at sil.org if<br>
> you have any questions<br>
> --------------------------------------<br>
>
>
> <br>
> ----WARNING----WARNING----WARNING----<br>
> Zip Files can contain harmful viruses<br>
> Do NOT open the attached zip file<br>
> unless you know the sender AND are expecting a zip file!<br>
> Contact jarmail_admin at sil.org if<br>
> you have any questions<br>
> --------------------------------------<br>
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
------------------------ Yahoo! Groups Sponsor --------------------~-->
What would our lives be like without music, dance, and theater?
Donate or volunteer in the arts today at Network for Good!
http://us.click.yahoo.com/TzSHvD/SOnJAA/79vVAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list