[Corpora-List] Questions for Google syntactic N-grams corpus

John F Sowa sowa at bestweb.net
Wed Nov 13 12:33:32 UTC 2013


On 11/13/2013 4:44 AM, Adam Kilgarriff wrote:
> While N-grams is a fascinating resource, it is not full sentences (and
> I'm not sure how much not-text and duplication it includes, this was
> a problem with the first version) so what you can do is constrained...

The N-grams also contain accidental patterns that just happen to have
a high frequency of occurrence on the WWW.

Peter Norvig at Google cited examples of advertising slogans such as
"Life is better with XYZ", where XYZ is a product name.  In certain
conditions, a phrase that matched the first part of the slogan would
get translated with some free advertising for XYZ.

John


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list