[Corpora-List] Hebrew texts in Latin lettrs

Wintner Shuly shuly at cs.haifa.ac.il
Thu Jul 14 04:58:38 UTC 2011


To this I should add that ALL the corpora distributed via MILA are available (also) in XML, where each token is transliterated in ASCII. The transliteration scheme, as well as the XML structure, are  listed here:
http://www.mila.cs.technion.ac.il/mila/eng/resources_standards.html
and the corpora are here:
http://www.mila.cs.technion.ac.il/mila/eng/resources_corpora.html

This gives you over 150M tokens.

Shuly

On Jul 14, 2011, at 06:40 , corpora-request at uib.no wrote:

> The Hebrew Treebank corpus from the Mila Knowledge Center for Processing
> Hebrew has a transliterated version. It is available here
> http://www.mila.cs.technion.ac.il/mila/eng/resources_treebank.html
> The transcription that was used is described in
> http://www.cs.technion.ac.il/~winter/Corpus-Project/paper.pdf
> 
> Noemie
> 
> 2011/7/13 Yuri Tambovtsev <yutamb at mail.ru>
> 
>> **
>> Dear Corpora colleagues, do you know any websites of Hebrew texts in Latin
>> lettrs? I cannot read Hebrew letters. However, I'd like to compare Hebrew
>> sound chains with those I have in about 300 world languages. Looking forward
>> to hearing from you soon to yutamb at mail.ru  Yours sincerely Yuri
>> Tambovtsev, Novosibirsk, Russia
> 


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list