[Corpora-List] Spellchecker evaluation corpus
Rich Cooper
rich at englishlogickernel.com
Mon Apr 11 02:25:56 UTC 2011
Comments below,
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
John F. Sowa
Sent: Sunday, April 10, 2011 7:06 PM
To: corpora at uib.no
Subject: Re: [Corpora-List] Spellchecker evaluation corpus
On 4/9/2011 7:03 AM, Eric Atwell wrote:
> Jennifer Pedler's PhD developed a spelling-error detection tool,
> evaluated on a corpus of real spelling errors;
Her slides had an example from a UK corpus that would be highly
unlikely in the US: {tort, taught}.
Japanese English is much better than it used to be, but it still
has L/R confusions.
Instead of a single corpus, it would be useful to have a set of
corpora for authors with different backgrounds.
Voila! The USPTO patent corpus has lots of examples, from lots of authors,
in lots of technical fields, where jargon could be detected.
-Rich
John Sowa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110410/3f09df06/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list