[Corpora-List] Request

maxwell maxwell at umiacs.umd.edu
Tue Apr 24 16:03:52 UTC 2012


On Tue, 24 Apr 2012 17:16:15 +0200, Serge Heiden <slh at ens-lyon.fr> wrote:
>  Have a look at TXM 0.6 (free AND open-source):
>  https://sourceforge.net/projects/txm [1]
>  It handles right-to-left writing systems display*.
>  You can check in the demo portal:
>  http://txm.risc.cnrs.fr/demo/?locale=en [2]
>  in which 'ONUAR' is a small sample UNO based arabic texts corpus.
>  (build a lexicon, double-clic on a word line then double-clic
>  on a KWIC line to get to the text edition)
> 
>  Best,
>  Serge
>  ____________________
> 
>  (*) Even if this is far from perfect (let alone the bad arabic
> tokenization, etc.).
>  This is done nearly automagically by the technology we use
> (Java+Javascript GWT
>  or Java Eclipse RCP/SWT for the desktop version) 

In case anyone else is working on the Dhivehi language, there's a bug in
Java which (as far as we have been able to discover) prevents proper
rendering of the Thaana script used for Dhivehi.  Thaana has long had a
Unicode block, but Java seems not to recognize that Thaana is written
right-to-left.  The bug does not affect Arabic script.  I've never checked
about other right-to-left scripts, like Syriac.

   Mike Maxwell
   University of Maryland

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list