[Lexicog] agreed-upon minimum size for lexicographic corpora
maxwell maxwell@umiacs.umd.edu [lexicographylist]
lexicographylist at yahoogroups.com
Tue Jul 5 15:41:02 UTC 2016
On 2016-07-05 05:39, 'Sang Yong Lee' sang-yong_lee at sall.com
[lexicographylist] wrote:
> ...About forty percent of those words, however,
> were inflected verb forms. (Newell 1995: 43)
My reply assumed lemmatization (or stemming), so that all the inflected
forms of a given verb lexeme would be conflated. That's why I referred
to "lemmas" instead of "words."
There is of course a chicken-and-egg problem with lemmatization: you
have to build a morphological parser before you can lemmatize, and you
need a dictionary before you can build a real morphological parser. In
practice, I suppose most people build both iteratively. A stemmer is of
course easier to build, but likely to be less precise, and doesn't
"know" about irregular words.
Mike Maxwell
------------------------------------
Posted by: maxwell <maxwell at umiacs.umd.edu>
------------------------------------
------------------------------------
Yahoo Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/lexicographylist/join
(Yahoo! ID required)
<*> To change settings via email:
lexicographylist-digest at yahoogroups.com
lexicographylist-fullfeatured at yahoogroups.com
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo Groups is subject to:
https://info.yahoo.com/legal/us/yahoo/utos/terms/
More information about the Lexicography
mailing list