[Corpora-List] Diachronic frequency change

Angus Grieve-Smith grvsmth at panix.com
Sun May 13 21:47:35 UTC 2012


On 5/11/2012 9:27 AM, Brett Reynolds wrote:
> Since this is a percentage, it shows an increase relative to other words. if you wanted to test for significance, would it make sense to simply use this comparison (string vs entire corpus) or would it make more sense to compare it to another similar string such as "many of the"? What statistical test would you use? Would it be best to compare the nadir and the peak, or to repeatedly compare consecutive years?
>
> I expect that the answers will be something like "that depends on your purpose." Currently, however, I don't really have a purpose. I'm just poking around, observing, and learning.

     Not quite!  The answer is "You can't test for significance if you 
don't have a representative sample."

     Yes, when lexical items increase in frequency (per word) it's 
always at the expense of something else.  That means that someone is 
making a choice to start using that word for a particular function 
instead of a competing construction.  I'm guessing something more like 
"all the," but it would take some more detailed study.

http://books.google.com/ngrams/graph?content=all+of+the%2C+all+the&year_start=1800&year_end=1950&corpus=0&smoothing=3

     There are other frequency effects, described by Joan Bybee and 
others: frequently used strings change their meanings in relatively 
predictable ways, and undergo phonological reduction.  You can't really 
investigate that with Google Ngrams, but it can give you an idea of 
where to start.

http://www.unm.edu/~jbybee/page4.html

-- 
				-Angus B. Grieve-Smith
				Saint John's University
				grvsmth at panix.com


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list