LL-L "Resources" 2011.05.09 (03) [EN]

Lowlands-L List lowlands.list at GMAIL.COM
Mon May 9 22:43:46 UTC 2011


=====================================================
L O W L A N D S - L - 09 May 2011 - Volume 03
lowlands.list at gmail.com - http://lowlands-l.net/
Posting: lowlands-l at listserv.linguistlist.org
Archive: http://listserv.linguistlist.org/archives/lowlands-l.html
Encoding: Unicode (UTF-08)
Language Codes: lowlands-l.net/codes.php
=====================================================



From: Marcus Buck list at marcusbuck.org
Subject: LL-L "Resources" 2011.05.09 (02) [EN]
From: R. F. Hahn <sassisch at yahoo.com>

                         If I search for a simple sentence such as "What is
your name?" nothing shows up, at least not so far.

I cannot reconstruct your problem. If I search for the phrase "What is your
name?" I get four results and the first one is that exact sentence with
direct translations in 22 languages and indirect translations in a bunch
other ones.

                         And why is selecting and copying any given sentence
not an option?

It's possible. The problem you are running into is probably that the
sentences are links. But if you click left of the green arrow, hold the
mouse and then select the sentence you can c&p.

                          I may be missing the point here, and I am sorry if
I misrepresent anything. But this exercise reminds me of what I watched
Japanese people do in trying to learn foreign languages: collect and
memorize random sentences while hardly improving their speaking and reading
skills.

A language has a finite number of words and idiomatic expressions. The
number of possible sentences, on the other hand, is infinite. So why index
sentences other than as examples under head words?

"head words" is a very "paper" way of thinking ;-) Or I do misunderstand
what you mean by indexing under head words. If you look up a word with the
search function it gives you a list of sentences with that word. Isn't that
a index under a head word?

Tatoeba does not dictate how you use it. If users like to memorize sentences
they can do so. If they want to use it in other ways they are also free to
do so. Tatoeba is just a corpus of linked sentences. It's a offer, freely
licensed and you can put it to whatever use you like. I guess there are many
creative ways how you can use it.

You could perhaps use it as a vocabulary lookup service. Let's say you are a
native English trying to learn Low Saxon. You stumble across the word
"Hümpel" in a Low Saxon text. You are not sure about the meaning. So you
look for Low Saxon sentences containing "Hümpel" on Tatoeba. So far there
are eight sentences with the word, four of which have a direct English
translation. You'll see that "Hümpel Lüüd" is a "crowd", a "Hümpel Snee" is
a "great deal of snow" and a "Hümpel Schrott" is a "pile of rubbish". So you
get an idea of the word's meaning. With each additional sentence containing
the word the meaning or the array of possible meanings of a word becomes
clearer.

You could also use the corpus to create a corpus-based automated translation
service similar to Google's automatic translation. For most languages the
corpus is too small to do that effectively, but as the corpus is open
source, freely licensed and extendable by everyone that can be changed.

And there are certainly many more things you can do with the corpus.

Being a help for language learners was the initial reason to create Tatoeba.
It developed from the Tanaka corpus, a corpus of about 150,000
English/Japanese sentence pairs that was compiled by Japanese students of
English lead by professor Yasuhito Tanaka. Later the corpus was put online
to sanitize it and remove the errors the students made. Then people were
allowed to add translations to third languages besides English and Japanese
and thus came the Tatoeba project into existance.

You probably all know the little travel dictionaries that collect simple
sentences that are useful for tourists in foreign countries. "How do I get
to the station?" "Where can I buy postcards?" and sentences like that.
Tatoeba is similar to that, but it is not limited to simple travel-related
sentences but open to everything. If you want to create a "travel
dictionary" for quantum physicists you can do that. Tatoeba is just a
platform to host the sentences. The users can decide what they want to do
with the corpus.

Marcus Buck
----------

From: R. F. Hahn <sassisch at yahoo.com>
Subject: Resources

Thanks, Marcus. That makes certain things clearer.

Good luck with it!

Regards,
Reinhard/Ron
Seattle, USA

 =========================================================
Send posting submissions to lowlands-l at listserv.linguistlist.org.
Please display only the relevant parts of quotes in your replies.
Send commands (including "signoff lowlands-l") to
listserv at listserv.linguistlist.org or lowlands.list at gmail.com
http://linguistlist.org/subscribing/sub-lowlands-l.html .
http://www.facebook.com/?ref=logo#!/group.php?gid=118916521473498
===============================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lowlands-l/attachments/20110509/1d3e76f9/attachment.htm>


More information about the LOWLANDS-L mailing list