[Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter

Merle Tenney merlet at microsoft.com
Thu Nov 9 20:50:21 UTC 2006


I'm sorry, Mark.  I think you may have misunderstood the thrust of my post.  I certainly didn't mean to lecture, and I am not working on any "Ultimate True Theory of English

Dialectology".  When you questioned the need for a parallel corpus, I wondered if you might have some insight into how to get some of the benefits of parallel corpora without actually having parallel corpora.  That is not a straw man; I think that is a worthwhile pursuit and probably tractable given the right approach and the right tools.  It would lead to powerful insights and powerful tools in lexicology, dialectology, translation, second language acquisition, and much more.  I would genuinely love to know if anyone has been able to achieve parallel corpus results with comparable corpus analysis techniques.  (I must confess, Mark, that I am not on the nominating committee for the Nobel Prize in Corpus Linguistics, so that offer was made in jest.  J )



I'm glad that you don't make typos.  I used to think that I didn't either, until I started using Word's new contextual speller.  Some still get past me, for sure, but definitely fewer than before.



Later, my friend.  I've got to get back to my toys. J



Merle





-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On Behalf Of Mark P. Line
Sent: Thursday, November 9, 2006 11:23 AM
To: Merle Tenney
Cc: CORPORA at UIB.NO
Subject: Re: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter



Merle Tenney wrote:

> Ramesh Krishnamurthy wrote:

>

>> ...and there is no obvious parallel corpus of Br-Am Eng to consult...

>

>> Do you know of one by any chance...

>

>> And Mark P. Line responded:

>

>>Why would it have to be a *parallel* corpus?

>

> [Merle's lecture snipped]

>

> Mark, if you can figure out a way to combine the quality and quantity of

> data from a very large corpus with the alignment and equivalence power of

> a parallel corpus without actually having a parallel corpus, I will

> personally nominate you for the Nobel Prize in Corpus Linguistics.



I was speaking in the context of the mostly anecdotal claims being made on

the parent thread, as to what it would take in the way of corpus

examination to support or defeat them. I thought this was the context in

which Ramesh was speaking, and I'm pretty sure it was the context of the

initial Oxfordian question of why nobody on this thread had been making

use of corpora.



I was not speaking in the context of the Ultimate True Theory of English

Dialectology, which would seem to be a strawman of your device.



So in case the fault was mine and I was unclear in my question to Ramesh,

please allow me to rephrase it: "Why would you need a *parallel* corpus to

make or refute claims of the kind we've been seeing on the parent thread?"

As nearly as I could tell, you didn't actually address that question in

your lecture.





> PS and Shameless Microsoft Plug:  In the last paragraph, I accidentally

> typed "figure out a why to combine" and I got the blue squiggle from Word

> 2007, which was released to manufacturing on Monday of this week.  It

> suggested way, and of course I took the suggestion.  I am amazed at the

> number of mistakes that the contextual speller has caught in my writing

> since I started using it.  I recommend the new version of Word and Office

> for this feature alone.



Thanks, but I think that would entail my switching to a toy operating

system as well as spending money on another piece of software, and I don't

make enough typographical mistakes to warrant such a drastic measure.



-- Mark



Mark P. Line

Polymathix

San Antonio, TX




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20061109/3d813a90/attachment.htm>


More information about the Corpora mailing list