23.2347, Disc: Language Immersion for Chrome and Alternatives

Wed May 16 16:28:09 UTC 2012

LINGUIST List: Vol-23-2347. Wed May 16 2012. ISSN: 1069 - 4875.

Subject: 23.2347, Disc: Language Immersion for Chrome and Alternatives

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin-Madison
Monica Macaulay, U of Wisconsin-Madison
Rajiv Rao, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

The LINGUIST List is a non-profit organization dedicated to providing the
discipline of linguistics with the infrastructure necessary to function in
the digital world. Donate to keep our services freely available!
https://linguistlist.org/donation/donate/donate1.cfm

Editor for this issue: Elyssa Winzeler <elyssa at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.cfm.

Date: Wed, 16 May 2012 12:28:06
From: Ziyuan Yao [yaoziyuan at gmail.com]
Subject: Language Immersion for Chrome and Alternatives

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-2347.html&submissionid=4546554&topicid=5&msgnumber=1

Google's ''Language Immersion for Chrome''

Recently a Chrome browser extension called ''Language Immersion for 
Chrome'' has been much publicized. Developed by ''Use All Five Inc.'' 
on behalf of Google, the extension translates certain words and 
phrases on the Web page you're browsing to a foreign language via 
Google Translate, for the purpose of helping you learn that foreign 
language while browsing the Web.

I have been researching this kind of thing for years, and one of my 
main standpoints is machine translation shouldn't be used in serious 
language learning as it is error-prone: it takes a learner a great effort 
to memorize a piece of erroneous knowledge, another great effort to 
''unlearn'' this wrong knowledge and yet another great effort to 
''relearn'' the right knowledge.

But I do understand online machine translation services like Google 
Translate and Bing Translator are so readily available that directly 
using them to do the translation can minimize development costs. Upon 
seeing this news, I asked myself: ''Can we use a kind of freely 
available, manually prepared data, instead of machine translation, to 
do this better?'' And the answer is YES!

A Better Idea

Imagine if we have a database of manually-translated bilingual 
sentence pairs (such as those multilingual movie subtitle files on those 
subtitle websites), e.g.

        (German)  Er ist ein guter Schüler.
        (English) He is a good student.

Now if a German wants to learn English, and he happens to be 
browsing a German Web page that contains the German word 
''Schüler'' (student), and the computer finds out that this German word 
also occurs in a bilingual sentence pair like the above. Now, the 
computer can teach English for this German word, by inserting the 
above bilingual sentence pair into that Web page, like an embedded 
advertisement. This way, the German will learn the English word 
''student'', and better yet, learn it in a bilingual sentence pair! This 
means he will not only learn the word ''student'' alone, but also its 
syntax, semantics and pragmatics, all implied by this example 
sentence. As to phonetics, the computer can use text-to-speech to 
read aloud the English sentence, or display some kind of pronunciation 
guide above or alongside the English sentence (see my recent project 
''Phonetically Intuitive English'' for such a pronunciation aid: 
https://sites.google.com/site/phoneticallyintuitiveenglish/).

That's the basic idea. But of course we can further refine this idea. For 
example, if there are multiple bilingual sentence pairs containing 
''Schüler'', the computer can prefer a pair that contains words that 
appear near ''Schüler'' on the Web page (i.e. context words). This 
would be very useful if the word in question (Schüler) is ambiguous.

Besides bilingual sentence pairs, we may also explore multilingual data 
from Wiktionary and Wikipedia, although their usage may not be as 
straightforward as the model discussed above. I leave this as 
homework for the reader.

I also intend to develop a Chrome extension based on the idea 
discussed above :-). I would be interested in hearing other's viewpoints 
and perspectives on this concept and its development.

Best Regards,
Ziyuan Yao
https://sites.google.com/site/yaoziyuan/

Linguistic Field(s): Computational Linguistics
                     Language Acquisition

----------------------------------------------------------
LINGUIST List: Vol-23-2347	
----------------------------------------------------------