[Corpora-List] Machine Translation and Spelling Correction
Marcin Miłkowski
list-address at wp.pl
Thu Dec 3 16:27:29 UTC 2009
Ah, yes, if you don't care about the corrected text, creating a noisy
corpus is trivially easy, as Yuval pointed out. I was speaking of an
error corpus that would contain corrections as well.
Best,
Marcin
Yuval Marton pisze:
> Nicola,
>
> There is quite a bit of work on spelling correction, using edit distance and other similarity measures.
> One tool that is geared towards machine translation that comes to my mind right now is this:
> (spelling correction is only one element of this tool)
>
> Nizar Habash (2009). REMOOV: A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation. In Proc. the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt.
>
> Any paraphrasing tool (for MT or otherwise) is likely to correct some spelling errors as well, although perhaps not with that as the primary focus.
>
> As for available corpora with noisy text... any corpus will do :-)
> It depends on the language and genre and type of errors you are interested in. Transcribed conversations (speech) presumably have more errors in them. Some might be automatically mis-corrected. Blogs and IM logs corpora might be a good place to start.
>
> HTH,
>
> -Yuval
>
>
>
> On Thu, Dec 3, 2009 at 10:01 AM, Nicola Bertoldi <bertoldi at fbk.eu> wrote:
> Nicola Bertoldi <bertoldi at fbk.eu>
> to "corpora at uib.no" <corpora at uib.no>
> date Thu, Dec 3, 2009 at 10:01 AM
> subject [Corpora-List] Machine Translation and Spelling Correction
> mailing list <corpora.uib.no> Filter messages from this mailing list
> unsubscribe Unsubscribe from this mailing-list
>
> hide details 10:01 AM (34 minutes ago)
>
>
> I am going to do some investigation to improve machine translation
> when it is applied to texts corrupted by misspellings of any sort (non-word, real-word errors).
>
> In this preliminary phase I am collecting information about the spelling correction task
> and other applications and tasks which involves spelling correction.
>
> In particular, I am interested in
> - surveys about the task
> - statistics about the most common misspellings in texts of different languages and different genres
> - public available software for spelling correction
> - available corpora of noisy texts
> - any further resources which is possibly useful for my topic
>
>
>
> Thanks!
>
> Nicola
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list