[Corpora-List] Spellchecker evaluation corpus

Rich Cooper rich at englishlogickernel.com
Mon Apr 11 02:25:56 UTC 2011


Comments below,

 

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

 

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
John F. Sowa
Sent: Sunday, April 10, 2011 7:06 PM
To: corpora at uib.no
Subject: Re: [Corpora-List] Spellchecker evaluation corpus

 

On 4/9/2011 7:03 AM, Eric Atwell wrote:

> Jennifer Pedler's PhD developed a spelling-error detection tool,

> evaluated on a corpus of real spelling errors;

 

Her slides had an example from a UK corpus that would be highly

unlikely in the US:  {tort, taught}.

 

Japanese English is much better than it used to be, but it still

has L/R confusions.

 

Instead of a single corpus, it would be useful to have a set of

corpora for authors with different backgrounds.

 

Voila!  The USPTO patent corpus has lots of examples, from lots of authors,
in lots of technical fields, where jargon could be detected.  

 

-Rich

 

John Sowa

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110410/3f09df06/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list