[Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter

Mark P. Line mark at polymathix.com
Thu Nov 9 19:23:23 UTC 2006


Merle Tenney wrote:
> Ramesh Krishnamurthy wrote:
>
>> ...and there is no obvious parallel corpus of Br-Am Eng to consult...
>
>> Do you know of one by any chance...
>
>> And Mark P. Line responded:
>
>>Why would it have to be a *parallel* corpus?
>
> [Merle's lecture snipped]
>
> Mark, if you can figure out a way to combine the quality and quantity of
> data from a very large corpus with the alignment and equivalence power of
> a parallel corpus without actually having a parallel corpus, I will
> personally nominate you for the Nobel Prize in Corpus Linguistics.

I was speaking in the context of the mostly anecdotal claims being made on
the parent thread, as to what it would take in the way of corpus
examination to support or defeat them. I thought this was the context in
which Ramesh was speaking, and I'm pretty sure it was the context of the
initial Oxfordian question of why nobody on this thread had been making
use of corpora.

I was not speaking in the context of the Ultimate True Theory of English
Dialectology, which would seem to be a strawman of your device.

So in case the fault was mine and I was unclear in my question to Ramesh,
please allow me to rephrase it: "Why would you need a *parallel* corpus to
make or refute claims of the kind we've been seeing on the parent thread?"
As nearly as I could tell, you didn't actually address that question in
your lecture.


> PS and Shameless Microsoft Plug:  In the last paragraph, I accidentally
> typed "figure out a why to combine" and I got the blue squiggle from Word
> 2007, which was released to manufacturing on Monday of this week.  It
> suggested way, and of course I took the suggestion.  I am amazed at the
> number of mistakes that the contextual speller has caught in my writing
> since I started using it.  I recommend the new version of Word and Office
> for this feature alone.

Thanks, but I think that would entail my switching to a toy operating
system as well as spending money on another piece of software, and I don't
make enough typographical mistakes to warrant such a drastic measure.

-- Mark

Mark P. Line
Polymathix
San Antonio, TX



More information about the Corpora mailing list