[Lexicog] agreed-upon minimum size for lexicographic corpora

maxwell maxwell@umiacs.umd.edu [lexicographylist] lexicographylist at yahoogroups.com
Tue Jul 5 11:41:02 EDT 2016


On 2016-07-05 05:39, 'Sang Yong Lee' sang-yong_lee at sall.com 
[lexicographylist] wrote:
> ...About forty percent of those words, however,
> were inflected verb forms. (Newell 1995: 43)

My reply assumed lemmatization (or stemming), so that all the inflected 
forms of a given verb lexeme would be conflated.  That's why I referred 
to "lemmas" instead of "words."

There is of course a chicken-and-egg problem with lemmatization: you 
have to build a morphological parser before you can lemmatize, and you 
need a dictionary before you can build a real morphological parser.  In 
practice, I suppose most people build both iteratively.  A stemmer is of 
course easier to build, but likely to be less precise, and doesn't 
"know" about irregular words.

    Mike Maxwell



------------------------------------
Posted by: maxwell <maxwell at umiacs.umd.edu>
------------------------------------


------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    lexicographylist-digest at yahoogroups.com 
    lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo Groups is subject to:
    https://info.yahoo.com/legal/us/yahoo/utos/terms/



More information about the Lexicography mailing list