[Corpora-List] Universal POS Tagset

Daniel Zeman zeman at ufal.mff.cuni.cz
Mon Feb 2 14:12:19 UTC 2009


Hi Adam,

I've been working on similar stuff and have had a poster at last year's 
LREC:
http://ufal.mff.cuni.cz:8080/bib/?section=publication&id=-6437616343801484763&mode=view
(and the framework is here:
https://wiki.ufal.ms.mff.cuni.cz/user:zeman:interset )

However, my universal tagset is a virtual one - it's a definition of 
possible features and their values, not exactly a set of tags (encoded 
as strings). Also, it's work in progress, and changes will be needed to 
achieve universality. Anyway, let me know if I can be of any help.

Best,
Dan

Adam Teichert napsal(a):
> Hello all.
>
>
>   I've been looking for a POS tagset that is general enough to
> effectively tag "any" natural language.  (I'm looking at Linguistic
> Typology / Universal Implications so I want to compare POS taggings
> across many [possibly obscure] languages.) Does anyone know of such a
> tagset?
>
>   If anyone is interested in what I've found so far, this paper seems relevant:
>     "Induction of Fine-grained Part-of-speech Taggers via Classifier
> Combination and Crosslingual Projection" (Elliott Franco Dr´abek,
> David Yarowsky)
>     http://acl.ldc.upenn.edu/W/W05/W05-0807.pdf
>
>   Also, I'm aware of some efforts at Microsoft Research India, to
> perhaps develop a "universal" tagset for Indian Languages:
>     http://research.microsoft.com/en-us/groups/mls/default.aspx
>
>
>   Thanks for any ideas.
>
>   --Adam (R. Teichert)
>
>    MS Student
>    School of Computing
>    University of Utah
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>   

-- 
RNDr. Daniel Zeman, Ph.D.
ÚFAL MFF, Univerzita Karlova, Praha
http://ufal.mff.cuni.cz/~zeman/


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list