[Corpora-List] I need texts in Tagalog, Indonesian, etc in electronic form

Mike Maxwell maxwell at ldc.upenn.edu
Fri May 23 15:48:10 UTC 2003


Yuri Tambovtsev wrote:
> Dear colleagues, could you be so kind as to send me some Web sites
> with the texts in more seldom languages? Do you know any email
> address in Vatikan to ask for the Bible text in some exotic language
> like Hakas, Tatar, Turkish, Choockchee, Koriak, Itelmen (Kamchadal),
> Hawaiian, Phillippino (Tagalog), Swahili, Ainu, Indonesian or
> Tibetan, etc, etc in the electronic form? How is it possible to get
> it? Looking forward to hearing from you to yutamb at hotmail.com Remain
> yours most cordially Yuri Tambovtsev

The Vatican is probably not the most likely place to look for Bibles.
Most Bible translation has been done by Protestant organizations, at
least since the Reformation.  There are quite a few web sites that
contain lists of Bibles in web-accessible form.  Try

    http://bible.gospelcom.net/languages/

http://directory.google.com/Top/Society/Religion_and_Spirituality/Christianity/Bible/Various_Languages/

http://www.seekgod.org/bible/links.html#Multiple%20Language%20OnLine%20Bibles

http://dmoz.org/Society/Religion_and_Spirituality/Christianity/Bible/Various_Languages/
    http://bible.com/bible_read.html
    http://www.acm.ndsu.nodak.edu/NDSU_Christian/tracts/stl/trkjv.htm
    http://scriptureresources.com/downloads.asp (Guatemalan languages)
    http://www.htmlbible.com
    http://benjamin.umd.edu/parallel/

SIL (www.sil.org) has done a lot of translation work in minority
languages, but their translations are not in general accessible on-line.

Also, if you're interested in particular languages, you can either
search for "Bible Tatar", or plug a few well-chosen words from your
target language into a search engine.  We've had very good luck using
that technique to unearth all kinds of texts in a variety of languages.
Of course you'll have a language ID problem, if you don't know the
language you're searching for.

Be aware that some of the on-line translations are older version--newer
translations are likely to be copyrighted and not available on the web.
Some day...

Also, we occasionally have found translations which are distributed as
PDF files from which one cannot extract the text; Persian/Farsi is one
example.  (Of course one can extract text from many PDF files, I just
mean that some PDF files are essentially images of the page.  And text
extracted from "ordinary" PDF files is often in what amounts to an
unknown encoding.)

     Mike Maxwell
     Linguistic Data Consortium
     maxwell at ldc.upenn.edu



More information about the Corpora mailing list