[Corpora-List] Moving Lexical Semantics from Alchemy to Science
amsler at cs.utexas.edu
amsler at cs.utexas.edu
Fri Jan 21 16:59:14 UTC 2011
The comments re: 'shopping cart' and 'shopping trolley' seem to me to
reinforce a problem that keeps the field of lexical semantics as
alchemy rather than as a more scientific pursuit. We just don't have
enough data about compound nouns to be certain of what they are doing
in the language overall; to know whether they are manifestations of
underlying rules or happenstance creations. The OED provides us with
some historical dates for first occurrences of open compounds and
large contemporary corpora provide us with statistics on the extant
forms in use today, but until now we've lacked the access to the
statistical (frequency) history of the open compounds over time.
Fortunately, now the Google nGrams from Google books has filled in
that void.
The reason compounds are important is that while we also have access
to isolated words, those can't easily be automatically disambiguated,
so knowing their frequencies over time doesn't tell us as much as we
need to know about what they meant in context. Most (not all) open
compounds are unambigious (I still get taken in by 'solar system' when
it is used to refer to a bank of solar panels!), but mostly we can
depend on open comounds being unambiguous.
To me, that means the next big advance in lexical semantics could come
from a large database of statistics by language variant and yearly
chronology of the frequencies of open compounds. I'd like to be able
to easily compare the historical frequency record of 'shopping cart'
and 'shopping trolley' in British and American (and Australian and
...) English to watch the growth of the terms in frequency
year-by-year AS WELL AS to be able to easily find a list of all the
other open compounds formed from 'shopping', 'cart' and 'trolley' over
the same chronology.
Until such time as we can reliably disambiguate the isolated word
forms in histrical corpora, the open compounds may provide the next
best clue to the discovery of the facts on which a science of lexical
semantics can be built.
... P.S. Anyone have some other ambiguous open compounds they are
familiar with, besides 'solar system'?
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list