[Corpora-List] Difference in POS tag distribution in different genres

Adam Kilgarriff adam at lexmasterclass.com
Mon Dec 17 03:24:08 UTC 2012


Dear Karin

> more proper nouns in news paper text than in fiction

certainly true.  In general, the more formal/informational a text is, the
more nominal, with more nouns, adjs/determiners; the more
informal/interactional, the more verbs and pronouns.  Fiction and newspaper
are noteworthy for past tenses and 3rd-person pronouns.

Mark Davies and Andrew Hardie have already mentioned Doug Biber's work,
I'll just add what I think of as the key/original reference, his "Variation
across Speech and Writing", CUP 1988.

Sketch Engine has support for all such research, you can easily
find contrasting POS-tag frequencies between corpora/subcorpora under
'wordlist' functionality (for any tagged corpora/languages)

Another favourite reference of mine: Heylighen and Dewaele
http://pespmc1.vub.ac.be/Papers/Formality.pdf

My own recent contribution:
Getting to know your
corpus<http://trac.sketchengine.co.uk/attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw><http://trac.sketchengine.co.uk/raw-attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw>
 in: *Proc. Text, Speech, Dialogue (TSD 2012)*, Lecture Notes in Computer
Science. Sojka, P., Horak, A., Kopecek, I., Pala, K. (eds). Springer.

Best,

  Adam


On 12 December 2012 10:00, Karin Cavallin <karin.cavallin at ling.gu.se> wrote:

> Does anyone know of any study of the difference in (and an analysis of the
> reasons) part-of-speech tag distribution in different genres? A quick study
> I made yesterday showed e.g. that my working hypothesis that there are more
> proper nouns in news paper text than in fiction was correct, at least on
> the data I examined.
>
> Karin Cavallin
> PhD Student in Computational Linguistics
> University of Gothenburg, Sweden
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121217/04c8e21c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list