[Corpora-List] I need texts in Tagalog, Indonesian, etc in electronic form
Mike Maxwell
maxwell at ldc.upenn.edu
Fri May 23 15:48:10 UTC 2003
Yuri Tambovtsev wrote:
> Dear colleagues, could you be so kind as to send me some Web sites
> with the texts in more seldom languages? Do you know any email
> address in Vatikan to ask for the Bible text in some exotic language
> like Hakas, Tatar, Turkish, Choockchee, Koriak, Itelmen (Kamchadal),
> Hawaiian, Phillippino (Tagalog), Swahili, Ainu, Indonesian or
> Tibetan, etc, etc in the electronic form? How is it possible to get
> it? Looking forward to hearing from you to yutamb at hotmail.com Remain
> yours most cordially Yuri Tambovtsev
The Vatican is probably not the most likely place to look for Bibles.
Most Bible translation has been done by Protestant organizations, at
least since the Reformation. There are quite a few web sites that
contain lists of Bibles in web-accessible form. Try
http://bible.gospelcom.net/languages/
http://directory.google.com/Top/Society/Religion_and_Spirituality/Christianity/Bible/Various_Languages/
http://www.seekgod.org/bible/links.html#Multiple%20Language%20OnLine%20Bibles
http://dmoz.org/Society/Religion_and_Spirituality/Christianity/Bible/Various_Languages/
http://bible.com/bible_read.html
http://www.acm.ndsu.nodak.edu/NDSU_Christian/tracts/stl/trkjv.htm
http://scriptureresources.com/downloads.asp (Guatemalan languages)
http://www.htmlbible.com
http://benjamin.umd.edu/parallel/
SIL (www.sil.org) has done a lot of translation work in minority
languages, but their translations are not in general accessible on-line.
Also, if you're interested in particular languages, you can either
search for "Bible Tatar", or plug a few well-chosen words from your
target language into a search engine. We've had very good luck using
that technique to unearth all kinds of texts in a variety of languages.
Of course you'll have a language ID problem, if you don't know the
language you're searching for.
Be aware that some of the on-line translations are older version--newer
translations are likely to be copyrighted and not available on the web.
Some day...
Also, we occasionally have found translations which are distributed as
PDF files from which one cannot extract the text; Persian/Farsi is one
example. (Of course one can extract text from many PDF files, I just
mean that some PDF files are essentially images of the page. And text
extracted from "ordinary" PDF files is often in what amounts to an
unknown encoding.)
Mike Maxwell
Linguistic Data Consortium
maxwell at ldc.upenn.edu
More information about the Corpora
mailing list