[Lexicog] dictionary markup language
Mike Maxwell
maxwell at LDC.UPENN.EDU
Tue Oct 10 02:09:46 UTC 2006
koocachoo_de wrote:
> the last month, I've been working on a dictionary markup language
> called dicML 2.0. I won't say that it's perfect, but it's on its way
> to become it! Just kidding.
>
> Anyway. I would love if those who are interested would cast a quick
> glance on it. You'll find the specification and some examples at:
> http://gml.gidoo.de/en/index.html
I don't have the time now to take a detailed look, but I'll just make a
general comment, and a more specific one--and I emphasize both are from
just a quick glance, so could be off-base.
General comment: You currently have what appears to be a two-way split
in the entry element, between the lemma and the sense, and then assign
some grammatical information to the lemma. You might instead want to
look at dividing up the entry in a three-way split:
phonology and orthography
grammatical information
semantic information
Your current way of dividing up the information restricts you to
dictionaries that have one category (POS and other grammatical
information) per lemma; the alternative allows a single entry to have
multiple categories, which is a way that some dictionaries are
organized, where the language allows that. (Inflecting languages
generally don't, because the citation form of a noun is likely to be
different from the citation form of a verb.) That kind of dictionary
usually has the senses organized under the categories, so there would
need to be a way of linking senses to their categories, or of otherwise
supporting this additional level of hierarchical structure.
Even if you don't decide to allow an <entry> to have multiple
grammatical categories, you can still use the three-way partition of
information as a way of managing grammatical information as a unit.
Specific comment: There seems to be a lot of language-specific structure
here. Maybe that's your intention--if this is just intended for a small
set of Indo-European languages, with emphasis on the European. But if
you want it to be more general, one of the structures you might want to
change include <sep> (obviously intended for German, but part of a more
general problem with multi-word lexemes). Likewise gen.gr and num:
gender and number are only two of the many kinds of grammatical
information you might want to keep track of in a dictionary: there are
other things like conjugation and declension class, animacy, exceptional
vowel harmony properties or other phonological properties that cannot be
predicted from the lexical form, allomorph classes (like some of the
stem vowel changes that happen in French and Spanish), etc. etc.
Similarly, the lists of possible pos and num attributes are very
Euro-centric (although maybe the lists are only intended to be
illustrative, not exhaustive).
Some other attempts at defining the organization of dictionaries (or
lexicons), worth looking at, include MDF (a de facto standard), the SIL
models used in LinguaLinks and in FLEx (nee Fieldworks), the ISO draft,
and OLIF.
--
Mike Maxwell
maxwell at ldc.upenn.edu
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/lexicographylist/join
(Yahoo! ID required)
<*> To change settings via email:
mailto:lexicographylist-digest at yahoogroups.com
mailto:lexicographylist-fullfeatured at yahoogroups.com
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list