[Corpora-List] Difference in POS tag distribution in different genres

Michal Ptaszynski michal.ptaszynski at gmail.com
Tue Dec 18 18:11:37 UTC 2012


Hi Karin,

I've recently published a work on developing a Japanese blog corpus. One  
part of this work consists of comparison between POS distributions among  
different sizes (small, medium, large) but similar genres/language, and  
comparable sizes, but different languages (Japanese, British English,  
Italian). *-->

Interestingly, the paper was once rejected, with one of the reasons being  
that comparison of POS distributions is "meaningless" and a "waste of  
time". I'm happy, that at least some people think it is not.

Best,

Michal

*)
Michal Ptaszynski, Pawel Dybala, Rafal Rzepka, Kenji Araki and Yoshio  
Momouchi, “YACIS: A Five-Billion-Word Corpus of Japanese Blogs Fully  
Annotated with Syntactic and Affective Information”, In Proceedings of The  
AISB/IACAP World Congress 2012 in Honour of Alan Turing, 2nd Symposium on  
Linguistic and Cognitive Approaches To Dialog Agents (LaCATODA 2012), pp.  
40-49


-----------------------------
Od: Karin Cavallin <karin.cavallin at ling.gu.se>
Do: "corpora at uib.no" <corpora at uib.no>
Data: Wed, 12 Dec 2012 10:00:46 +0000
Temat: [Corpora-List] Difference in POS tag distribution in different  
genres

Does anyone know of any study of the difference in (and an analysis of the  
reasons) part-of-speech tag distribution in different genres? A quick  
study I made yesterday showed e.g. that my working hypothesis that there  
are more proper nouns in news paper text than in fiction was correct, at  
least on the data I examined.

Karin Cavallin
PhD Student in Computational Linguistics
University of Gothenburg, Sweden
 

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list