The Internet Corpora
James A. Crippen
james at UnLambda.COM
Tue Apr 3 23:05:01 UTC 2001
Could the World Wide Web and all the text in various languages available
on it be considered legitimate forms of corpora? Granted there are a
large number of spelling mistakes (try searching for 'trasnlation' or
'lingiust'), and a large number of obvious grammatical errors (search for
'if he love me' for example), but the extant text available online
certainly exceeds the size of any corpus for a major language by orders of
magnitude.
In the future, will we start seeing explicit WWW search results in
papers? This could easily become a major point of argument...
'james
--
James A. Crippen <james at unlambda.com> ,-./-. Anchorage, Alaska,
Lambda Unlimited: Recursion 'R' Us | |/ | USA, 61.2069 N, 149.766 W,
Y = \f.(\x.f(xx)) (\x.f(xx)) | |\ | Earth, Sol System,
Y(F) = F(Y(F)) \_,-_/ Milky Way.
More information about the HPSG-L
mailing list