[Corpora-List] Part of Speech annotation of Persian and Urdu corpora

Bushra Zawaydeh bzawaydeh at hotmail.com
Wed Feb 27 14:41:25 UTC 2008


hi Ben
the question was about locating a company that would do the manual annotation for us using a set of tags that we determine, according to guidelines that we write. Are there companies out there that does that? 
thank you
Bushra

> Date: Wed, 27 Feb 2008 11:44:36 +0000
> From: B.Allison at dcs.shef.ac.uk
> To: corpora at uib.no
> Subject: Re: [Corpora-List] Part of Speech annotation of Persian and Urdu corpora
> 
> Bushra,
> 
> I'm not sure whether you want human-annotated text from which to induce 
> a tagger, or are interested in having a working POS tagger itself. If 
> the latter, then about a year ago we tracked down a 10 million word 
> corpus of Persian which had been hand-annotated, and induced a tagger 
> from the 1 million word part that the creators were prepared to give 
> away for research purposes. The tagset they used (which they created for 
> the job) could be interpreted on two levels -- there was a coarse tagset 
> of 14 tags with categories like Noun, Verb, etc. and a much finer one 
> which I believe ran to about 150 tags. Accuracies were pretty good -- 
> over 98% for coarse tags, and around 92% for the fine ones.
> 
> I'm not sure if you're prepared for a DIY approach, but I suspect that 
> if you are, you could get hold of the corpus we used (I can pass you 
> contact information) and use one of many trainable taggers to induce 
> your own. Of course, this might not be what you were thinking of...
> 
> Ben
> 
> hfaili at ece.ut.ac.ir wrote:
> > Dear Bushra,
> > I am working in an Iranian Company (named Douran www.douran.com) which
> > have a good experience and a tools for POS tagging, and other NLP fields
> > in Persian...
> > for more information contact me via hfaili at douran.com
> > regards
> >
> > hello
> > I was wondering if anybody knows of any companies or individual linguists
> > who would do Part of Speech annotation of Persian and Urdu corpora?
> >
> > Thank you
> > Bushra Zawaydeh
> >
> > ********************************************************************
> > Bushra Zawaydeh                           bushraz at basistech.com
> > Senior Linguist
> > Basis Technology                           Tel: (617)386-7130
> > One Alewife Center                         Fax: (617)386-2020
> > Cambridge, MA 02140-2327
> > USA
> > **********************************************************************
> >
> >
> > --------------------------------------------------------------------------------
> > Helping your favorite cause is as easy as instant messaging. You IM, we
> > give. Learn more.
> >
> > __________ NOD32 2853 (20080206) Information __________
> >
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
> >
> >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >   
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080227/129aa8f5/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list