[Corpora-List] Google searches as linguistic evidence

Diana Maynard d.maynard at dcs.shef.ac.uk
Thu Dec 7 13:33:56 UTC 2006


Ramesh Krishnamurthy wrote:
> I suspect many are typos. People are far less fussy about 
> proof-reading website information..
Is this a fact, or just a gut reaction?
Obviously it differs depending what type of material we're talking about 
- blogs are much more likely to contain typos and spelling mistakes etc. 
But if we're talking factual websites, are people less fussy about 
proofreading? I'm not sure that my gut feeling is the same, but no doubt 
there is evidence.
If we use the whole web as a corpus, clearly there will be more mistakes 
than in e.g. the BNC,  that's my point about having to weed out the 
rubbish if you do want to use the web as a reliable source.

Diana



More information about the Corpora mailing list