[Corpora-List] Do we still need language corpora?

Angus B. Grieve-Smith grvsmth at panix.com
Sat Feb 5 19:14:11 UTC 2011


On 2/5/2011 5:35 AM, Serge Sharoff wrote:
> Uni Oslo's noWaC is a case in point, Marco and his colleagues created a two-billion
> ukWac and parsed it syntactically (using Malt parser), I added a
> BNC-like domain and genre annotation layer to it, so the web is at your
> finger tips.
     ... for existential, not distributional questions.  If I understand 
your description right, it's not a representative sample of anything, so 
any percentages you find are not generalizable beyond the sample, or 
perhaps beyond the Web.

-- 
				-Angus B. Grieve-Smith
				Saint John's University
				grvsmth at panix.com


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list