[Corpora-List] Part of Speech annotation of Persian and Urdu corpora

Ben Allison B.Allison at dcs.shef.ac.uk
Wed Feb 27 14:52:35 UTC 2008


Bushra,

I suspect there are. However, my personal experience would be that 
passing annotation to someone else to annotate according to your 
guidelines would be dangerous if the annotation scheme you propose is 
untested -- in my experience, the annotation process should ideally be 
symbiotic, with refinements coming to categories/schemes as preliminary 
annotation is performed. Otherwise, you will either impose an annotation 
scheme which may ultimately be unsuitable, or you may lose control of 
the eventual scheme. Are there strong reasons for you not to 
arrange/perform the annotation yourself?

Ben

Bushra Zawaydeh wrote:
> hi Ben
> the question was about locating a company that would do the manual 
> annotation for us using a set of tags that we determine, according to 
> guidelines that we write. Are there companies out there that does that?
> thank you
> Bushra
>
> > Date: Wed, 27 Feb 2008 11:44:36 +0000
> > From: B.Allison at dcs.shef.ac.uk
> > To: corpora at uib.no
> > Subject: Re: [Corpora-List] Part of Speech annotation of Persian and 
> Urdu corpora
> >
> > Bushra,
> >
> > I'm not sure whether you want human-annotated text from which to induce
> > a tagger, or are interested in having a working POS tagger itself. If
> > the latter, then about a year ago we tracked down a 10 million word
> > corpus of Persian which had been hand-annotated, and induced a tagger
> > from the 1 million word part that the creators were prepared to give
> > away for research purposes. The tagset they used (which they created 
> for
> > the job) could be interpreted on two levels -- there was a coarse 
> tagset
> > of 14 tags with categories like Noun, Verb, etc. and a much finer one
> > which I believe ran to about 150 tags. Accuracies were pretty good --
> > over 98% for coarse tags, and around 92% for the fine ones.
> >
> > I'm not sure if you're prepared for a DIY approach, but I suspect that
> > if you are, you could get hold of the corpus we used (I can pass you
> > contact information) and use one of many trainable taggers to induce
> > your own. Of course, this might not be what you were thinking of...
> >
> > Ben
> >
> > hfaili at ece.ut.ac.ir wrote:
> > > Dear Bushra,
> > > I am working in an Iranian Company (named Douran www.douran.com) which
> > > have a good experience and a tools for POS tagging, and other NLP 
> fields
> > > in Persian...
> > > for more information contact me via hfaili at douran.com
> > > regards
> > >
> > > hello
> > > I was wondering if anybody knows of any companies or individual 
> linguists
> > > who would do Part of Speech annotation of Persian and Urdu corpora?
> > >
> > > Thank you
> > > Bushra Zawaydeh
> > >
> > > ********************************************************************
> > > Bushra Zawaydeh bushraz at basistech.com
> > > Senior Linguist
> > > Basis Technology Tel: (617)386-7130
> > > One Alewife Center Fax: (617)386-2020
> > > Cambridge, MA 02140-2327
> > > USA
> > > **********************************************************************
> > >
> > >
> > > 
> --------------------------------------------------------------------------------
> > > Helping your favorite cause is as easy as instant messaging. You 
> IM, we
> > > give. Learn more.
> > >
> > > __________ NOD32 2853 (20080206) Information __________
> > >
> > > This message was checked by NOD32 antivirus system.
> > > http://www.eset.com
> > >
> > >
> > >
> > > _______________________________________________
> > > Corpora mailing list
> > > Corpora at uib.no
> > > http://mailman.uib.no/listinfo/corpora
> > >
> > >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
>
> ------------------------------------------------------------------------
> Shed those extra pounds with MSN and The Biggest Loser! Learn more. 
> <http://biggestloser.msn.com/>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list