[Corpora-List] Incidence of MWEs

Fri Mar 17 07:05:19 UTC 2006

Adam Kilgarriff a écrit :

> US dictionaries are ***way, way*** behind UK dictionaries in corpus use.  UK
> dictionary publishers lead the world in corpus development and use (with NLP
> lagging behind).  OUP and Longman were prime movers in developing the BNC,
> and OUP is now on the point of launching its billion-word corpus of English.
> Collins-COBUILD was the great pioneer in the 1980s. 

Just a small point of history outside English: to my knowledge the 
earliest instance of large corpus-based lexicography is that of the 
Trésor de la Langue Francaise, lauched around 1960. A computer corpus of 
over 100 M words was created, which was used for the creation of the 
monumental 16-volume TLF dictionary (100,000 headwords, 230,000 
definitions, 430,000 examples).

On line at http://atilf.atilf.fr/

History: http://www.cnrs.fr/Cnrspresse/n96a7.html (fr)

The corpus (Frantext) comprises now 210 M words (127 M words POS-tagged) 
and is available on-line for registered users:

http://www.atilf.fr/frantext.htm (fr)
-- 
   jv

   Web:  http://www.up.univ-mrs.fr/veronis
   Blog: http://aixtal.blogspot.com