Corpora: Annotation tool & Arabic POS Respons.

Mohamed Noamany mfn at cs.nmsu.edu
Wed Jul 18 13:43:47 UTC 2001


Dear Colleagues,
	Thanks for people who respond. It varies between
training Brill tagger and the follow E-mail.
I am resendg it as per request of many persons.
***** It seems that I have to start preparing manully tagged set for
Arabic first which what I intend to do. So, Here comes
my next question: I sthere any annotation tools that can help
in tagiing Arabic manually.
Thanks again,
	MOhamed F. Noamany

On Tue, 17 Jul 2001, Oliver Mason wrote:

> Dear Mohamed,
>
> I have a language-independent tagger, QTag, which can be trained using a
> pre-tagged sample text as input.  It is implemented in Java, so it should
> handle Arabic texts alright, though I have never tested it.  However, I'm
> happy to assist you in adapting the tagger to work with Arabic!
>
> What you would need to have is either a (machine-readable) lexicon, or a
> tagged sample text.  This can be used to create a resource file for the tagger,
> which you can then use to tag other (larger) texts with.  If you then correct
> any errors in the tagging you can repeat the process with the new (larger)
> training set, and you will then end up with fewer errors.
>
> In an evaluation with Romanian we used a few 10,000 words as training data
> and got a rate of about 98+% correct tag assignments.
>
> Regards,
> Oliver Mason
>
> --
> //\\ lecturer | centre for corpus linguistics | dept. of english | school of
> //\\ humanities | the university of birmingham | edgbaston | birmingham b15
> \\// 2tt | united kingdom | phone +44(0)121-414-6206 | fax +44(0)121-414-  /\
> \\// 5668 | web http://www.clg.bham.ac.uk | email o.mason at bham.ac.uk       \/
>



More information about the Corpora mailing list