[Corpora-List] Diachronic frequency change
Angus Grieve-Smith
grvsmth at panix.com
Sun May 13 21:47:35 UTC 2012
On 5/11/2012 9:27 AM, Brett Reynolds wrote:
> Since this is a percentage, it shows an increase relative to other words. if you wanted to test for significance, would it make sense to simply use this comparison (string vs entire corpus) or would it make more sense to compare it to another similar string such as "many of the"? What statistical test would you use? Would it be best to compare the nadir and the peak, or to repeatedly compare consecutive years?
>
> I expect that the answers will be something like "that depends on your purpose." Currently, however, I don't really have a purpose. I'm just poking around, observing, and learning.
Not quite! The answer is "You can't test for significance if you
don't have a representative sample."
Yes, when lexical items increase in frequency (per word) it's
always at the expense of something else. That means that someone is
making a choice to start using that word for a particular function
instead of a competing construction. I'm guessing something more like
"all the," but it would take some more detailed study.
http://books.google.com/ngrams/graph?content=all+of+the%2C+all+the&year_start=1800&year_end=1950&corpus=0&smoothing=3
There are other frequency effects, described by Joan Bybee and
others: frequently used strings change their meanings in relatively
predictable ways, and undergo phonological reduction. You can't really
investigate that with Google Ngrams, but it can give you an idea of
where to start.
http://www.unm.edu/~jbybee/page4.html
--
-Angus B. Grieve-Smith
Saint John's University
grvsmth at panix.com
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list