25.3578, Qs: Normalization of Multilexemic Corpus Data

The LINGUIST List linguist at linguistlist.org
Wed Sep 10 21:40:20 UTC 2014


LINGUIST List: Vol-25-3578. Wed Sep 10 2014. ISSN: 1069 - 4875.

Subject: 25.3578, Qs: Normalization of Multilexemic Corpus Data

Moderators: Damir Cavar, Indiana U <damir at linguistlist.org>
            Malgorzata E. Cavar, Indiana U <gosia at linguistlist.org>

Reviews: reviews at linguistlist.org
Anthony Aristar <aristar at linguistlist.org>
Helen Aristar-Dry <hdry at linguistlist.org>
Sara Couture, Indiana U <sara at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Anna White <awhite at linguistlist.org>
================================================================  


Date: Wed, 10 Sep 2014 17:40:11
From: Daniela Schroeder [daniela.schroeder at uni-hamburg.de]
Subject: Normalization of multilexemic corpus data

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=25-3578.html&submissionid=35956477&topicid=8&msgnumber=1
 
Dear all,

I am currently working on my PhD thesis and have encountered a problem for
which I do not have a solution, so I was wondering whether some of you have
had the same problem in the past and could help me out.

I am conducting a corpus study and have already collected all necessary data.
I was looking for multilexemic constructions, consisting of, for example, a
wh-pronoun followed by a personal pronoun.
I am not interested in a collostructional strength or something like that, but
I am simply unsure of how to normalize these numbers from different corpora.
What I did so far was normalizing per 1 mio words, but I feel that this is not
satisfactory since I am not dealing with single words, but with constructions.
Has anyone ever normalized these?

I really need those numbers for further statistical analyses and I am a bit
helpless. I have checked with Baayen, Gries and Hilpert, among others, but I
did not find anything there (or didn't look well enough).
So, to cut a long story short: Can anyone tell me how to normalize
multilexemic corpus data?
Thank you very much!

Best,

Daniela Schroeder
 

Linguistic Field(s): Text/Corpus Linguistics






----------------------------------------------------------
LINGUIST List: Vol-25-3578	
----------------------------------------------------------




    



More information about the LINGUIST mailing list