[Lexicog] dictionary markup language

Mike Maxwell maxwell at LDC.UPENN.EDU
Tue Oct 10 02:09:46 UTC 2006


koocachoo_de wrote:
> the last month, I've been working on a dictionary markup language
> called dicML 2.0. I won't say that it's perfect, but it's on its way
> to become it! Just kidding.
> 
> Anyway. I would love if those who are interested would cast a quick
> glance on it. You'll find the specification and some examples at:
> http://gml.gidoo.de/en/index.html

I don't have the time now to take a detailed look, but I'll just make a 
general comment, and a more specific one--and I emphasize both are from 
just a quick glance, so could be off-base.

General comment: You currently have what appears to be a two-way split 
in the entry element, between the lemma and the sense, and then assign 
some grammatical information to the lemma.  You might instead want to 
look at dividing up the entry in a three-way split:
    phonology and orthography
    grammatical information
    semantic information
Your current way of dividing up the information restricts you to 
dictionaries that have one category (POS and other grammatical 
information) per lemma; the alternative allows a single entry to have 
multiple categories, which is a way that some dictionaries are 
organized, where the language allows that.  (Inflecting languages 
generally don't, because the citation form of a noun is likely to be 
different from the citation form of a verb.)  That kind of dictionary 
usually has the senses organized under the categories, so there would 
need to be a way of linking senses to their categories, or of otherwise 
supporting this additional level of hierarchical structure.

Even if you don't decide to allow an <entry> to have multiple 
grammatical categories, you can still use the three-way partition of 
information as a way of managing grammatical information as a unit.

Specific comment: There seems to be a lot of language-specific structure 
here.  Maybe that's your intention--if this is just intended for a small 
set of Indo-European languages, with emphasis on the European.  But if 
you want it to be more general, one of the structures you might want to 
change include <sep> (obviously intended for German, but part of a more 
general problem with multi-word lexemes).  Likewise gen.gr and num: 
gender and number are only two of the many kinds of grammatical 
information you might want to keep track of in a dictionary: there are 
other things like conjugation and declension class, animacy, exceptional 
vowel harmony properties or other phonological properties that cannot be 
predicted from the lexical form, allomorph classes (like some of the 
stem vowel changes that happen in French and Spanish), etc. etc. 
Similarly, the lists of possible pos and num attributes are very 
Euro-centric (although maybe the lists are only intended to be 
illustrative, not exhaustive).

Some other attempts at defining the organization of dictionaries (or 
lexicons), worth looking at, include MDF (a de facto standard), the SIL 
models used in LinguaLinks and in FLEx (nee Fieldworks), the ISO draft, 
and OLIF.
-- 
	Mike Maxwell
	maxwell at ldc.upenn.edu


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:lexicographylist-digest at yahoogroups.com 
    mailto:lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list