[Corpora-List] Hebrew texts in Latin lettrs
Wintner Shuly
shuly at cs.haifa.ac.il
Thu Jul 14 04:58:38 UTC 2011
To this I should add that ALL the corpora distributed via MILA are available (also) in XML, where each token is transliterated in ASCII. The transliteration scheme, as well as the XML structure, are listed here:
http://www.mila.cs.technion.ac.il/mila/eng/resources_standards.html
and the corpora are here:
http://www.mila.cs.technion.ac.il/mila/eng/resources_corpora.html
This gives you over 150M tokens.
Shuly
On Jul 14, 2011, at 06:40 , corpora-request at uib.no wrote:
> The Hebrew Treebank corpus from the Mila Knowledge Center for Processing
> Hebrew has a transliterated version. It is available here
> http://www.mila.cs.technion.ac.il/mila/eng/resources_treebank.html
> The transcription that was used is described in
> http://www.cs.technion.ac.il/~winter/Corpus-Project/paper.pdf
>
> Noemie
>
> 2011/7/13 Yuri Tambovtsev <yutamb at mail.ru>
>
>> **
>> Dear Corpora colleagues, do you know any websites of Hebrew texts in Latin
>> lettrs? I cannot read Hebrew letters. However, I'd like to compare Hebrew
>> sound chains with those I have in about 300 world languages. Looking forward
>> to hearing from you soon to yutamb at mail.ru Yours sincerely Yuri
>> Tambovtsev, Novosibirsk, Russia
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list