[Corpora-List] Incidence of MWEs
Jean Veronis
Jean.Veronis at up.univ-mrs.fr
Fri Mar 17 07:05:19 UTC 2006
Adam Kilgarriff a écrit :
> US dictionaries are ***way, way*** behind UK dictionaries in corpus use. UK
> dictionary publishers lead the world in corpus development and use (with NLP
> lagging behind). OUP and Longman were prime movers in developing the BNC,
> and OUP is now on the point of launching its billion-word corpus of English.
> Collins-COBUILD was the great pioneer in the 1980s.
Just a small point of history outside English: to my knowledge the
earliest instance of large corpus-based lexicography is that of the
Trésor de la Langue Francaise, lauched around 1960. A computer corpus of
over 100 M words was created, which was used for the creation of the
monumental 16-volume TLF dictionary (100,000 headwords, 230,000
definitions, 430,000 examples).
On line at http://atilf.atilf.fr/
History: http://www.cnrs.fr/Cnrspresse/n96a7.html (fr)
The corpus (Frantext) comprises now 210 M words (127 M words POS-tagged)
and is available on-line for registered users:
http://www.atilf.fr/frantext.htm (fr)
--
jv
Web: http://www.up.univ-mrs.fr/veronis
Blog: http://aixtal.blogspot.com
More information about the Corpora
mailing list