[Lexicog] Tshwanelex DTD
Mike Maxwell
maxwell at LDC.UPENN.EDU
Wed Sep 23 01:13:27 UTC 2009
pwyll4 at yahoo.fr wrote (in reply to my comments after '>> '):
>> Since dialect and speaker are two different things, I would suggest
>> creating separate fields for them, e.g.
>> source
>> |----dialect
>> |----speaker
>
> This is what I do, I use an abbreviation of the name of the place plus a
> number that says what speaker said the word/sentence. Eg. "Gd1" means
> "first speaker of my list of speakers from Guidel".
That isn't quite what I was suggesting--I was suggesting two separate
fields, one for the dialect, and one for the speaker. So instead of
<source>GD1</source)
in the XML, you would have
<source>
<dialect>Gd</dialect>
<speaker>1</speaker>
</source>
This probably seems like a silly usage of tags, but what it allows you
to do is to select all the forms that contain
<dialect>Gd</dialect>
--regardless of how many speakers of that dialect you have. In order to
do that with the "atomic" codes "Gd1", "Gd2", etc., you would have to
select all the forms that contain
(<dialect>Gd1</dialect>|dialect>Gd2</dialect>|dialect>Gd3</dialect>)
--assuming you had three speakers of the Gd dialect. And if you added a
fourth speaker of that dialect, you'd need to change your search to add
that code. Or if you forget how many speakers of the Gd dialect you
have, you might miss some cases that belong to that dialect.
There are doubtless somewhat simpler ways of doing this in TshwanaLex
that avoid repeating the <dialect>...</dialect> tags, or choosing the
tag from a drop-down. But the point remains that with atomic codes that
encode both dialect and speaker, searching for a particular dialect is
more difficult.
Notice that it's still easy to find all the items that came from speaker
1 of Gd, you just search on two fields at once.
There are also alternative ways to encode this kind of simple
information in XML, e.g.
<source dialect="Gd" speaker="1"/>
There are pros and cons to this representation (using attributes) and
the representation above (using elements). But both allow you to
separate out the distinct dialect and speaker information.
And that's the fundamental point: different kinds of information should
go in different "slots" in the dictionary entry, not be lumped together.
Mike Maxwell
------------------------------------
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/lexicographylist/join
(Yahoo! ID required)
<*> To change settings via email:
mailto:lexicographylist-digest at yahoogroups.com
mailto:lexicographylist-fullfeatured at yahoogroups.com
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list