[Corpora-List] Open source HMM POS tagger

Yannick Versley versley at sfs.uni-tuebingen.de
Mon Jan 7 21:17:05 UTC 2013


LingPipe is shared-source, i.e., you can use it freely (as long as you
don't sell
the output) and you can look at the source, but you cannot create derivative
works.

hunpos is a free (as in open source) HMM-based tagger
http://code.google.com/p/hunpos/

If you want/need to do something more exotic than hunpos can do (and
don't want to dig into hunpos' OCaml source code), the attached file
implements
a bare-bones but useful HMM tagger (with Kneser-Ney smoothing for the n-gram
model and suffix backoff for the word model) in under 300 lines of Python
code -
this should be easy to adapt to special needs if you want flexibility
rather than
execution speed.

Best wishes,
Yannick Versley

On Mon, Jan 7, 2013 at 9:46 PM, Lushan Han <lushan1 at umbc.edu> wrote:

> I know lingPipe provides a HMM pos tagger which is open source.
>
> Best,
>
> Lushan
>
> On Mon, Jan 7, 2013 at 5:37 AM, Fatemeh Torabi Asr <torabiasr at gmail.com>wrote:
>
>> Dear all,
>>
>> I'm looking for an open source efficient HMM POS tagger to run it for
>> something like an artificial language. I would like it to be configurable
>> for different sizes of N-grams, taking the list of possible tags and a
>> dictionary (small tagged corpus) and then could be trained on a large
>> corpus of un-annotated text.
>> I also wonder if any of the existing *HMM-based* POS taggers consider
>> word features (not only the word content but instead a feature vector of
>> the observable properties of the word in the un-labled text, e.g., some
>> semantic features attached to the word frame). So, it would be great if an
>> state-of-the-art HMM tagger implementation is already available considering
>> such a representation of the states.
>>
>> Best,
>> Fatemeh
>>
>>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
Dr. Yannick Versley

Sonderforschungsbereich 833
Universität Tübingen
Nauklerstr. 35
72074 Tübingen

Tel.: +49-7071-29 77155
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130107/b9f115b5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: model_tools.py
Type: application/octet-stream
Size: 8608 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130107/b9f115b5/attachment-0001.obj>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list