[Corpora-List] "Language Immersion for Chrome", and a Better Idea

Ziyuan Yao yaoziyuan at gmail.com
Mon May 14 20:41:24 UTC 2012


Google's "Language Immersion for Chrome"

Recently a Chrome browser extension called "Language Immersion for
Chrome" has been much publicized. Developed by "Use All Five Inc." on
behalf of Google, the extension translates certain words and phrases
on the Web page you're browsing to a foreign language via Google
Translate, for the purpose of helping you learn that foreign language
while browsing the Web.

I have been researching this kind of thing for years, and one of my
main standpoints is machine translation shouldn't be used in serious
language learning as it is error-prone: it takes a learner a great
effort to memorize a piece of erroneous knowledge, another great
effort to "unlearn" this wrong knowledge and yet another great effort
to "relearn" the right knowledge.

But I do understand online machine translation services like Google
Translate and Bing Translator are so readily available that directly
using them to do the translation can minimize development costs. Upon
seeing the this news, I asked myself: "Can we use a kind of freely
available, manually prepared data, instead of machine translation, to
do this better?" And the answer is YES!

A Bbetter Idea

Imagine if we have a database of manually-translated bilingual
sentence pairs (such as those multilingual movie subtitle files on
those subtitle websites), e.g.

        (German)  Er ist ein guter Schüler.
        (English) He is a good student.

Now if a German wants to learn English, and he happens to be browsing
a German Web page that contains the German word "Schüler" (student),
and the computer finds out that this German word also occurs in a
bilingual sentence pair like the above. Now, the computer can teach
English for this German word, by inserting the above bilingual
sentence pair into that Web page, like an embedded advertisement. This
way, the German will learn the English word "student", and better yet,
learn it in a bilingual sentence pair! This means he will not only
learn the word "student" alone, but also its syntax, semantics and
pragmatics, all implied by this example sentence. As to phonetics, the
computer can use text-to-speech to read aloud the English sentence, or
display some kind of pronunciation guide above or alongside the
English sentence (see my recent project "Phonetically Intuitive
English" for such a pronunciation aid:
https://sites.google.com/site/phoneticallyintuitiveenglish/).

That's the basic idea. But of course we can further refine this idea.
For example, if there are multiple bilingual sentence pairs containing
"Schüler", the computer can prefer a pair that contains words that
appear near "Schüler" on the Web page (i.e. context words). This would
be very useful if the word in question (Schüler) is ambiguous.

Besides bilingual sentence pairs, we may also explore multilingual
data from Wiktionary and Wikipedia, although their usage may not be as
straightforward as the model discussed above. I leave this as homework
for the reader.

I also intend to develop a Chrome extension based on the idea
discussed above :-)

Best Regards,
Ziyuan Yao
https://sites.google.com/site/yaoziyuan/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list