[Corpora-List] Frequency of pepositions

Jockers Matthew mjockers at stanford.edu
Wed Dec 1 17:15:06 UTC 2010


Hi Yuri,

In our lab we have been working on a project classifying 19th century novels based on their novelistic genre (e.g. Gothic, Bildungsroman, Industrial, etc).   We will be posting the resulting paper to our site (http://litlab.stanford.edu/) within the next couple of weeks under the title: “Quantitative Formalism: an Experiment in Genre Classification.”  

Among other things, we have found and discuss how prepositions, in particular, are useful in the classification of novelistic genres.  We suspect that this is because novelistic genres are often defined in relation to "place" and prepositions, especially locative prepositions are place-oriented.  Until the paper is published and posted to our web site, here are a few other sources that you might have a look at:

Grieve, J. (2007). Quantitative Authorship Attribution: An Evaluation of Techniques. Literary and Linguistic Computing: Journal of the Association for Literary and Linguistic Computing 22 (3):251-270.

Garcia, M., and C. Martin. (2007). Function Words in Authorship Attribution Studies. Literary and Linguistic Computing: Journal of the Association for Literary and Linguistic Computing 22 (1):49-66.

Hoover, D. L. (2001). Statistical Stylistics and Authorship Attribution: An Empirical Investigation. Literary and Linguistic Computing: Journal of the Association for Literary and Linguistic Computing 16 (4):421-444.

Yang, Y., and J. Pedersen. (1997).  A Comparative Study on Feature Selection in Text Categorization.  Proceedings of the 14th International Conference on Machine Learning (ICML ’97), 8–12 July, at Nashville, Tennessee: 412–20.

Zhao, Y., and J. Zobel. (2005). Effective and Scalable Authorship Attribution Using Function Words. In Lecture Notes in Computer Science. Berlin: Springer.


On Nov 30, 2010, at 1:18 PM, Yuri Tambovtsev wrote:

> Dear colleahues, usually it is possible to find out if two texts are different if some certain linguistic units are used there with different frequencies. Is it possible to differentiate two texts basing on the frequency of occurrence of preposions: on, in, at, under, over, etc. Has many articles been published on the use of prepositions as features? Looking forward to hearing from you to yutamb at mail.ru  Yours sincerely Yuri Tambovtsev, Novosibirsk, Russia
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

--
Matthew Jockers
Stanford University
http://www.stanford.edu/~mjockers



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101201/4479138f/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list