[Corpora-List] Difference in POS tag distribution in different genres
Michal Ptaszynski
michal.ptaszynski at gmail.com
Tue Dec 18 18:11:37 UTC 2012
Hi Karin,
I've recently published a work on developing a Japanese blog corpus. One
part of this work consists of comparison between POS distributions among
different sizes (small, medium, large) but similar genres/language, and
comparable sizes, but different languages (Japanese, British English,
Italian). *-->
Interestingly, the paper was once rejected, with one of the reasons being
that comparison of POS distributions is "meaningless" and a "waste of
time". I'm happy, that at least some people think it is not.
Best,
Michal
*)
Michal Ptaszynski, Pawel Dybala, Rafal Rzepka, Kenji Araki and Yoshio
Momouchi, “YACIS: A Five-Billion-Word Corpus of Japanese Blogs Fully
Annotated with Syntactic and Affective Information”, In Proceedings of The
AISB/IACAP World Congress 2012 in Honour of Alan Turing, 2nd Symposium on
Linguistic and Cognitive Approaches To Dialog Agents (LaCATODA 2012), pp.
40-49
-----------------------------
Od: Karin Cavallin <karin.cavallin at ling.gu.se>
Do: "corpora at uib.no" <corpora at uib.no>
Data: Wed, 12 Dec 2012 10:00:46 +0000
Temat: [Corpora-List] Difference in POS tag distribution in different
genres
Does anyone know of any study of the difference in (and an analysis of the
reasons) part-of-speech tag distribution in different genres? A quick
study I made yesterday showed e.g. that my working hypothesis that there
are more proper nouns in news paper text than in fiction was correct, at
least on the data I examined.
Karin Cavallin
PhD Student in Computational Linguistics
University of Gothenburg, Sweden
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list