[Corpora-List] Moving Lexical Semantics from Alchemy to Science

amsler at cs.utexas.edu amsler at cs.utexas.edu
Fri Jan 21 16:59:14 UTC 2011


The comments re: 'shopping cart' and 'shopping trolley' seem to me to  
reinforce a problem that keeps the field of lexical semantics as  
alchemy rather than as a more scientific pursuit. We just don't have  
enough data about compound nouns to be certain of what they are doing  
in the language overall; to know whether they are manifestations of  
underlying rules or happenstance creations. The OED provides us with  
some historical dates for first occurrences of open compounds and  
large contemporary corpora provide us with statistics on the extant  
forms in use today, but until now we've lacked the access to the  
statistical (frequency) history of the open compounds over time.  
Fortunately, now the Google nGrams from Google books has filled in  
that void.

The reason compounds are important is that while we also have access  
to isolated words, those can't easily be automatically disambiguated,  
so knowing their frequencies over time doesn't tell us as much as we  
need to know about what they meant in context. Most (not all) open  
compounds are unambigious (I still get taken in by 'solar system' when  
it is used to refer to a bank of solar panels!), but mostly we can  
depend on open comounds being unambiguous.

To me, that means the next big advance in lexical semantics could come  
from a large database of statistics by language variant and yearly  
chronology of the frequencies of open compounds. I'd like to be able  
to easily compare the historical frequency record of 'shopping cart'  
and 'shopping trolley' in British and American (and Australian and  
...) English to watch the growth of the terms in frequency  
year-by-year AS WELL AS to be able to easily find a list of all the  
other open compounds formed from 'shopping', 'cart' and 'trolley' over  
the same chronology.

Until such time as we can reliably disambiguate the isolated word  
forms in histrical corpora, the open compounds may provide the next  
best clue to the discovery of the facts on which a science of lexical  
semantics can be built.

... P.S. Anyone have some other ambiguous open compounds they are  
familiar with, besides 'solar system'?

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list