[Lexicog] Tshwanelex DTD

Mike Maxwell maxwell at LDC.UPENN.EDU
Wed Sep 23 01:13:27 UTC 2009


pwyll4 at yahoo.fr wrote (in reply to my comments after '>> '):
>> Since dialect and speaker are two different things, I would suggest
>> creating separate fields for them, e.g.
>> source
>> |----dialect
>> |----speaker
> 
> This is what I do, I use an abbreviation of the name of the place plus a 
> number that says what speaker said the word/sentence. Eg. "Gd1" means 
> "first speaker of my list of speakers from Guidel". 

That isn't quite what I was suggesting--I was suggesting two separate 
fields, one for the dialect, and one for the speaker.  So instead of
    <source>GD1</source)
in the XML, you would have
    <source>
       <dialect>Gd</dialect>
       <speaker>1</speaker>
    </source>
This probably seems like a silly usage of tags, but what it allows you 
to do is to select all the forms that contain
    <dialect>Gd</dialect>
--regardless of how many speakers of that dialect you have.  In order to 
do that with the "atomic" codes "Gd1", "Gd2", etc., you would have to 
select all the forms that contain
    (<dialect>Gd1</dialect>|dialect>Gd2</dialect>|dialect>Gd3</dialect>)
--assuming you had three speakers of the Gd dialect.  And if you added a 
fourth speaker of that dialect, you'd need to change your search to add 
that code.  Or if you forget how many speakers of the Gd dialect you 
have, you might miss some cases that belong to that dialect.

There are doubtless somewhat simpler ways of doing this in TshwanaLex 
that avoid repeating the <dialect>...</dialect> tags, or choosing the 
tag from a drop-down.  But the point remains that with atomic codes that 
encode both dialect and speaker, searching for a particular dialect is 
more difficult.

Notice that it's still easy to find all the items that came from speaker 
1 of Gd, you just search on two fields at once.

There are also alternative ways to encode this kind of simple 
information in XML, e.g.
    <source dialect="Gd" speaker="1"/>
There are pros and cons to this representation (using attributes) and 
the representation above (using elements).  But both allow you to 
separate out the distinct dialect and speaker information.

And that's the fundamental point: different kinds of information should 
go in different "slots" in the dictionary entry, not be lumped together.

    Mike Maxwell


------------------------------------

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:lexicographylist-digest at yahoogroups.com 
    mailto:lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list