[Corpora-List] Part of Speech annotation of Persian and Urdu corpora
Ben Allison
B.Allison at dcs.shef.ac.uk
Wed Feb 27 11:44:36 UTC 2008
Bushra,
I'm not sure whether you want human-annotated text from which to induce
a tagger, or are interested in having a working POS tagger itself. If
the latter, then about a year ago we tracked down a 10 million word
corpus of Persian which had been hand-annotated, and induced a tagger
from the 1 million word part that the creators were prepared to give
away for research purposes. The tagset they used (which they created for
the job) could be interpreted on two levels -- there was a coarse tagset
of 14 tags with categories like Noun, Verb, etc. and a much finer one
which I believe ran to about 150 tags. Accuracies were pretty good --
over 98% for coarse tags, and around 92% for the fine ones.
I'm not sure if you're prepared for a DIY approach, but I suspect that
if you are, you could get hold of the corpus we used (I can pass you
contact information) and use one of many trainable taggers to induce
your own. Of course, this might not be what you were thinking of...
Ben
hfaili at ece.ut.ac.ir wrote:
> Dear Bushra,
> I am working in an Iranian Company (named Douran www.douran.com) which
> have a good experience and a tools for POS tagging, and other NLP fields
> in Persian...
> for more information contact me via hfaili at douran.com
> regards
>
> hello
> I was wondering if anybody knows of any companies or individual linguists
> who would do Part of Speech annotation of Persian and Urdu corpora?
>
> Thank you
> Bushra Zawaydeh
>
> ********************************************************************
> Bushra Zawaydeh bushraz at basistech.com
> Senior Linguist
> Basis Technology Tel: (617)386-7130
> One Alewife Center Fax: (617)386-2020
> Cambridge, MA 02140-2327
> USA
> **********************************************************************
>
>
> --------------------------------------------------------------------------------
> Helping your favorite cause is as easy as instant messaging. You IM, we
> give. Learn more.
>
> __________ NOD32 2853 (20080206) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list