[Corpora-List] Google searches as linguistic evidence

maxwell at ldc.upenn.edu maxwell at ldc.upenn.edu
Thu Dec 7 15:05:33 UTC 2006


Quoting Ramesh Krishnamurthy <r.krishnamurthy at aston.ac.uk>:
> I don't know of many websites who use professional proof-readers...

I'm sure most readers of this list have already seen this, but just in case:

Christoph Ringlstetter, Klaus U. Schulz and Stoyan Mihov:  Orthographic 
Errors in Web Pages - Towards Cleaner Web Corpora . Computational 
Linguistics, September 2006, Vol. 32(3), pp. 295-340.

One useful output is a classification of websites into ones that have 
more or fewer misspellings.

   Mike Maxwell
   CASL/ U MD

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the Corpora mailing list