[Corpora-List] Difference in POS tag distribution in different genres
Adam Kilgarriff
adam at lexmasterclass.com
Mon Dec 17 03:24:08 UTC 2012
Dear Karin
> more proper nouns in news paper text than in fiction
certainly true. In general, the more formal/informational a text is, the
more nominal, with more nouns, adjs/determiners; the more
informal/interactional, the more verbs and pronouns. Fiction and newspaper
are noteworthy for past tenses and 3rd-person pronouns.
Mark Davies and Andrew Hardie have already mentioned Doug Biber's work,
I'll just add what I think of as the key/original reference, his "Variation
across Speech and Writing", CUP 1988.
Sketch Engine has support for all such research, you can easily
find contrasting POS-tag frequencies between corpora/subcorpora under
'wordlist' functionality (for any tagged corpora/languages)
Another favourite reference of mine: Heylighen and Dewaele
http://pespmc1.vub.ac.be/Papers/Formality.pdf
My own recent contribution:
Getting to know your
corpus<http://trac.sketchengine.co.uk/attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw><http://trac.sketchengine.co.uk/raw-attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw>
in: *Proc. Text, Speech, Dialogue (TSD 2012)*, Lecture Notes in Computer
Science. Sojka, P., Horak, A., Kopecek, I., Pala, K. (eds). Springer.
Best,
Adam
On 12 December 2012 10:00, Karin Cavallin <karin.cavallin at ling.gu.se> wrote:
> Does anyone know of any study of the difference in (and an analysis of the
> reasons) part-of-speech tag distribution in different genres? A quick study
> I made yesterday showed e.g. that my working hypothesis that there are more
> proper nouns in news paper text than in fiction was correct, at least on
> the data I examined.
>
> Karin Cavallin
> PhD Student in Computational Linguistics
> University of Gothenburg, Sweden
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director Lexical Computing
Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow University of
Leeds<http://leeds.ac.uk>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
*DANTE: a lexical database for
English<http://www.webdante.com>
*
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121217/04c8e21c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list