[Corpora-List] corpus annotation tool
maxwell
maxwell at umiacs.umd.edu
Tue Aug 19 17:09:23 UTC 2014
On 2014-08-02 02:54, Hristo Tanev wrote:
> I have got many useful answers to my question, regarding the corpus
> annotation tool. I am thankful to all who gave me an answer. Here is a
> brief summary of the answers I have got:
> ...
> Omnia Zayed has informed me about the existence of a tool called BRAT
> http://brat.nlplab.org/
> ...
This is *way* after this thread ended, but I'm just now getting a round
tuit.
While BRAT claims Unicode compatibility, we tried to use it for Arabic
annotation, and it did not work for us. Specifically, right-to-left
scripts are not supported in the SVG visualization. This of course
applies not only to Arabic, but to any language that uses Perso-Arabic
script (Persian/ Farsi, Dari, Urdu, Pashto, Punjabi as written in
Pakistan, and other languages), as well as Hebrew, Syriac and Dhivehi
(and perhaps some other languages/ scripts). The issue has been raised
before:
https://github.com/nlplab/brat/issues/774
https://github.com/nlplab/brat/issues/1057
https://github.com/nlplab/brat/issues/1018
We would be very interested in hearing about annotation tools that do
deal well with right-to-left scripts.
With regard to Dhivehi (Thaana script), there is an outstanding bug in
Java that prevents any Java-based app from correctly displaying anything
longer than a single line. I can't find the bug report now, but I know
it was submitted. At any rate, this effectively rules out any
Java-based annotation tool for Dhivehi. Java works fine for
Perso-Arabic scripts; I have not tried it with Hebrew or Syriac script.
Mike Maxwell
University of Maryland
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list