[Corpora-List] semantics and alchemy
Rich Cooper
rich at englishlogickernel.com
Fri Jan 28 10:11:33 UTC 2011
P.S.,
Hi again Anne,
You wrote:
However, if the quest really is to find primitive semantic traits, shouldn't
there be a multilingual perspective?
Previous recent discussions on this list convinced me that universal
primitives are to complex and diverse for the tools we have to study in all
languages. My approach is to find an adequate primitive vocabulary that
conveys the meaning within a class defined as extendable and with documented
extension, all in English. There is a better chance of finding an adequate
set of primitives in one precisely used language (i.e. in a recorded patent
document database available to the public).
In a multilingual view compounds are problematic because they vary both
semantically and in their construction type and thus do not constitute a
stable category of comparison ('ice cream' is a case in point).
Agreed, but using the frequent/rare/other classification based on claim
vocabulary histograms within previously annotated classes of a complete
ontology helps focus in on any specific class, any specific patent, for
diversity of ways in which the primitives are embedded in meanings exchanged
by a diversity of English speakers. That diversity of source documents, and
the negotiation of claim examiners with claim developers, provides a real
knowledge base of English usage in a fully public database of structured and
unstructured text columns.
In my view, trying to find primitives by studying compounds is risky,
because compounds seem to depend strongly on a language's morphology which
should be considered arbitrary.
I disagree on the arbitrariness. Consider a collection of corpora from
which the set of adequate primitives is to be extracted to form the ontolgy.
Every corpus is in one language, in this example, but diversity of
expression among phrase authors is more than adequate to provide good
evidence of how that one language is used for a variety of purposes.
Isn't it safer to study the collocational environment of lexical units as a
whole instead of restricting efforts to noun compounds? One could, for
example, construct a similar matrix for verbs that take 'ice cream' as
object or adjectives that combine with the word. But I am not sure whether
this is still the initial idea .? Anne-Kathrin Schumann PhD student
University of Leipzig/University of Vienna
When are units enough to form a "whole"? I don't see a clear demarcation
between too much and not enough, other than every person and every language
being represented in the sample database. That is impractically large for a
primitive discovery project in my experience, though your application area
may differ substantially from mine.
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Anne-Kathrin Schumann
Sent: Wednesday, January 26, 2011 4:11 AM
To: corpora at uib.no
Subject: [Corpora-List] semantics and alchemy
Hi, I am really interested in this experiment proposal by Mr Amsler: "If
we take the open compounds from a machine-readable dictionary and split out
of them two lists of first words and second words, and then create a matrix
with the first words as the x-axis and the second words as the y-axis and
the individual cells as a 1 or a 0 dependent upon whether that compound
exists in the dictionary/language or not... What would a factor analysis of
that very sparse binary matrix reveal? Could it indicate the existence of
primitive properties shared by groups of words? (Say scalar traits for
temperature words such as 'hot', 'cool', 'cold')." However, if the quest
really is to find primitive semantic traits, shouldn't there be a
multilingual perspective? In a multilingual view compounds are problematic
because they vary both semantically and in their construction type and thus
do not constitute a stable category of comparison ('ice cream' is a case in
point). In my view, trying to find primitives by studying compounds is
risky, because compounds seem to depend strongly on a language's morphology
which should be considered arbitrary. Isn't it safer to study the
collocational environment of lexical units as a whole instead of restricting
efforts to noun compounds? One could, for example, construct a similar
matrix for verbs that take 'ice cream' as object or adjectives that combine
with the word. But I am not sure whether this is still the initial idea .?
Anne-Kathrin Schumann PhD student University of Leipzig/University of Vienna
___________________________________________________________
Empfehlen Sie WEB.DE DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.web.de
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110128/7cdd9e7a/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list