[Corpora-List] corpus annotation tool

maxwell maxwell at umiacs.umd.edu
Tue Aug 19 17:09:23 UTC 2014


On 2014-08-02 02:54, Hristo Tanev wrote:
> I have got many useful answers to my question, regarding the corpus
> annotation tool. I am thankful to all who gave me an answer. Here is a
> brief summary of the answers I have got:
> ...
> Omnia Zayed has informed me about the existence of a tool called BRAT
> http://brat.nlplab.org/
> ...

This is *way* after this thread ended, but I'm just now getting a round 
tuit.

While BRAT claims Unicode compatibility, we tried to use it for Arabic 
annotation, and it did not work for us.  Specifically, right-to-left 
scripts are not supported in the SVG visualization.  This of course 
applies not only to Arabic, but to any language that uses Perso-Arabic 
script (Persian/ Farsi, Dari, Urdu, Pashto, Punjabi as written in 
Pakistan, and other languages), as well as Hebrew, Syriac and Dhivehi 
(and perhaps some other languages/ scripts).  The issue has been raised 
before:

    https://github.com/nlplab/brat/issues/774
    https://github.com/nlplab/brat/issues/1057
    https://github.com/nlplab/brat/issues/1018

We would be very interested in hearing about annotation tools that do 
deal well with right-to-left scripts.

With regard to Dhivehi (Thaana script), there is an outstanding bug in 
Java that prevents any Java-based app from correctly displaying anything 
longer than a single line.  I can't find the bug report now, but I know 
it was submitted.  At any rate, this effectively rules out any 
Java-based annotation tool for Dhivehi.  Java works fine for 
Perso-Arabic scripts; I have not tried it with Hebrew or Syriac script.

    Mike Maxwell
    University of Maryland


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list