[Lexicog] News and Offers from TshwaneDJe

Jan F. Ullrich jfu at LAKHOTA.ORG
Sat Mar 21 18:28:45 UTC 2009


 

 

> I'm curious: how many distinct fields do you have? 

 

I counted forty seven fields. But not all of them are exported for the printed version. 

I should also mention that I employ some custom defined fields and that I use several of the MDF fields for different purposes and in different hierarchical positions from what they were designed for. My programmer modified the export CCT according to my needs. This CCT also checks for consistencies in the hierarchical structure and we have another CCT that can actually fix the position of a field if it is placed improperly. But we haven’t used it extensively as the entry structure has been established quite early on and I think I have been able to work with it quite consistently. It is possible to define a record template in Toolbox, a feature I have been using from the beginning. You can also definite which field follows a particular field so when you press enter it automatically creates the field that is supposed to follow. These functions are very helpful in keeping the structure consistent.

 

> Can any repeat, are any of them hierarchical (like example sentences need to 

> come under a sense), and do any of them come in pairs or triplets (like example 
> sentence + translation)? 

 

Yes, exactly; the pair \xv and \xe (example sentences and its English translation) comes after the sense definition. The field \ue (usage) precedes example sentences when it is present. The example sentence bundle also includes reference to its source, but it is not exported for the printed version.

I also use the field bundle \lf \lv \le quite extensively. These fields are placed under paradigm and contain reference to various grammatical forms, such as reduplicated, contracted, datives, benefactives, possessive, reflexives, causatives etc. These forms are also given as separate records but they are always referenced from the base form. Another pair of fields is at the very end of an entry and contains reference to the Dakota dialect variants, whenever these differ from the lemma on phonemic or lexical level (this is the case in about 30% or lemmas).

 

 

> How many records do you have? 

 

As of now the Lakota-English dictionary database contains 43,152 records and 112,263 example sentences or collocations. But the printed version of the dictionary includes only 20,000 records and 43,000 sentences. The other entries need more research or editing and we will be expanding the printed version with each new edition, and especially in the electronic version of the dictionary.

 

The English-Lakota side of the dictionary is a separate database, it has about half the number of records and a very different record structure. It is not a full-fledged English-Lakota dictionary (for instance it does not contain the example sentences because including them twice would made the book huge, it is already 1200 pages) but it is much more detailed than just a finder list.

 

Sample pages from both the Lakota-English and English-Lakota sections can be seen here:

http://www.inext.cz/siouan/sample_nld.pdf

 

 

We are now in the process of programming a lemmatizing scrip that will help us create a list of all possible inflection forms so that they can be included in the multimedia version of the dictionary. We only have estimates now but it will probably be around 2 million lexemes. Each will be linked to its lemma in the electronic dictionary.

 

> And how have you verified that your field hierarchy is consistent?

 

The CCT that I mentioned above is one way of doing this. The printed version of the dictionary was of course proofread by several people. The only inconsistency was occasional omission of sense number or a wrong value of the sense number. I know that TshwaneLex automates sense and homonym numbers and I believe it to be great advantage.

Other than that we have not found any inconsistencies in the record structure and hierarchy.

 

I still think Toolbox is an excellent software for a lexicographer and it has allowed me to do pretty much everything I wanted to with the dictionary. But it is also important to say that I would not be able to do some of those things without the excellent programmer that I have in my team. So, Toolbox offers flexibility to do things, but with that flexibility comes additional work. TshwaneLex probably automates a lot of the features that we do with various CCT’s and scripts but then maybe it isn’t flexible enough to allow us do other things. My main worry though, is that the data entry process in much time-consuming the TshwaneLex than it is in Toolbox.

 

Jan

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20090321/365e68c1/attachment.htm>


More information about the Lexicography mailing list