[Corpora-List] semantics and alchemy

Wladimir Sidorenko wlsidorenko at gmail.com
Fri Jan 28 09:50:34 UTC 2011


As I can see such a matrix could provide 2 kinds of information:

-  information about the productivity of certain words either as cores
or as attributes of complex nouns
- information about the combinatorics of certain words with each other

Except    for   semantics   these   data  would  be  very  useful  for
syntactical parsing   and   construction   of   dependency  trees. Because in
DG-based machine  translation  systems for example it's  often  difficult to
understand to what part of a complex compound an attribute relates to,
e.g.:
in   `machine   translation  systems'  - the word `machine' relates to
`translation', and the word `translation' relates to `systems', but in
`metal     oxide     semiconductor    field   effect   transistor'   -
`semiconductor' would rather refer to `transistor' than to `field'
Or  consider  German  `Panzerabwehrlenkflugkörpersystem'  (anti-tank
guided  missile  system)  -  here  it's quite tricky to find out, what
relates  to  what, if we don't know the exact meaning of the words.
(is  it  Abwehrsystem  or  Abwehrlenkflugkörper or  Abwehrlenkflug or Abwehrlenkung)
After   having  collected  such  a  matrix  it  would  potentially  be
possible:
1.  to  find  out  a  formula which could help analyze dependencies in
complex nouns more precisely
2.  to  experiment  with  uniting words which are likely to occur with
each  other  in  certain  groups.  I  mean, it is assumable that nouns
denoting  some chemistry entries are more likely to be used along with
nouns  denoting  drinking  or  food  than with nouns which denote some
cultural  events  or  whatever.  These  groups  could again be somehow
considered in the formula (1).

I  also  would  like to know whether the author of the experiment will
deal  with stone walls (compounds) consisting of more than 2 elements.
And how he/she will handle them.

Best Regards,
Vladimir


> Hi,   I am really interested in this experiment proposal by Mr
> Amsler:   "If we take the open compounds from a machine-readable
> dictionary and split out of them two lists of first words and second
> words, and then create a matrix with the first words as the x-axis
> and the second words as the y-axis and the individual cells as a 1
> or a 0 dependent upon whether that compound exists in the
> dictionary/language or not... What would a factor analysis of that
> very sparse binary matrix reveal? Could it indicate the existence of
> primitive properties shared by groups of words? (Say scalar traits
> for temperature words such as 'hot', 'cool', 'cold')."   However, if
> the quest really is to find primitive semantic traits, shouldn't
> there be a multilingual perspective? In a multilingual view
> compounds are problematic because they vary both semantically and in
> their construction type and thus do not constitute a stable category
> of comparison ('ice cream' is a case in point). In my view, trying
> to find primitives by studying compounds is risky, because compounds
> seem to depend strongly on a language's morphology which should be
> considered arbitrary. Isn't it safer to study the collocational
> environment of lexical units as a whole instead of restricting
> efforts to noun compounds? One could, for example, construct a
> similar matrix for verbs that take 'ice cream' as object or
> adjectives that combine with the word. But I am not sure whether
> this is still the initial idea …?   Anne-Kathrin Schumann PhD
> student University of Leipzig/University of Vienna
> ___________________________________________________________
> Empfehlen Sie WEB.DE DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.web.de

> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



-- 
Mit freundlichen Grüßen
Wladimir Sidorenko
mailto:wlsidorenko at gmail.com



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list