[Lexicog] Digital Glossarization

Mike Maxwell maxwell at LDC.UPENN.EDU
Fri May 9 14:28:39 UTC 2008


Jimm GoodTracks wrote:
> I believe that there is value in a "Back of the Book" glossary...
> 
> ...For Native American Languages, the verb complex is the most 
> important and complex element of a sentence... Certainly a reader can
> be written in the most simplest format, which involves no prefixes, 
> suffixes, infixes, conjugations, nor additional grammatical elements 
> to be added on to the verb in a Native American sentence.  In this 
> case, the reader would only be able to speak in the 3rd person 
> singular, namely:  He/ she/ it...
> 
> When other voices are introduced, namely --  I, you, we, they and 
> dual or plural elements  -- the verb complex begins to build via 
> prefixes and suffixes which have no meaning when detached apart from 
> the verb.

This same problem happens with many morphologically complex languages,
of which Arabic and Nahuatl (a language of Mexico) are examples.  The 
difficulty is compounded (no pun), in the sense that most of the words 
cannot be looked up in the dictionary, because they are literally not in 
the dictionary--at least not in that form.

I recently posted to this group about a project to do assisted 
dictionary lookup, using a morphological parser (probably before the 
original poster joined this list).  I won't repeat what I said, but if 
you look in the archives, my posting was 2 May at 2:55 PM, in the thread 
"Collaborative lexicography."

> In addition, direct and indirect discourse, prepositional elements, 
> probability and more can all factor in.  And to this end, the literal
> translation provided in a more layman's terms rather than a 
> professional linguistic rendition seems to be the most helpful to the
> language student.

This is also an issue, and not one we can claim to have solved in the 
above-mentioned project.  One can imagine that most readers would not be 
helped by a gloss like 'IncompletiveAspect-2Ergative-hit-1Absolutive'. 
If you are producing a semi-literal translation by hand, then of course 
you can give a more helpful (if perhaps less accurate) translation.  But 
it's hard to know how to do this automatically, in a way that is both 
general (within a specific language) and accurate.  (I don't know for 
sure, but I suspect this would be the downfall of a statistical MT 
program, even for languages where there are sufficient parallel corpora 
to make that possible.)  Our proposed solution is to link the morpheme 
glosses to a grammar, but that can be cumbersome.
-- 
	Mike Maxwell
	maxwell at ldc.upenn.edu

------------------------------------

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:lexicographylist-digest at yahoogroups.com 
    mailto:lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list