Corpora: Reference

Mitch Marcus mitch at linc.cis.upenn.edu
Mon Feb 12 17:09:19 UTC 2001


Mari,

Someone is confusing me with Mark Liberman, which isn't all that
unusual.  Mark at an invited talk at some ACL annual meeting or other
presented a list of new words in the AP newswire that occured after
100 million words of text.  None of the words were that unusual.  I
don't think he published this anywhere.

 Mitch

:
:Can anyone provide a reference for a purported study, in which someone
:analyzed the Wall Street Journal for new words, the number of which tailed
:off to 20 words per (month? week?) after a certain point? Or is this an NLP
:urban legend? A colleague recalls Mitch Marcus pointing out that the rate of
:new word occurrences does not asymptote but rather continues at some small
:but non-trivial rate, but not whether this is Marcus' own study, an
:observation, or a reference to another work.
:
:Thanks,
:
:Mari Olsen
:Microsoft-Natural Language Group
:



More information about the Corpora mailing list