[Corpora-List] semantics and alchemy
Wladimir Sidorenko
wlsidorenko at gmail.com
Fri Jan 28 09:50:34 UTC 2011
As I can see such a matrix could provide 2 kinds of information:
- information about the productivity of certain words either as cores
or as attributes of complex nouns
- information about the combinatorics of certain words with each other
Except for semantics these data would be very useful for
syntactical parsing and construction of dependency trees. Because in
DG-based machine translation systems for example it's often difficult to
understand to what part of a complex compound an attribute relates to,
e.g.:
in `machine translation systems' - the word `machine' relates to
`translation', and the word `translation' relates to `systems', but in
`metal oxide semiconductor field effect transistor' -
`semiconductor' would rather refer to `transistor' than to `field'
Or consider German `Panzerabwehrlenkflugkörpersystem' (anti-tank
guided missile system) - here it's quite tricky to find out, what
relates to what, if we don't know the exact meaning of the words.
(is it Abwehrsystem or Abwehrlenkflugkörper or Abwehrlenkflug or Abwehrlenkung)
After having collected such a matrix it would potentially be
possible:
1. to find out a formula which could help analyze dependencies in
complex nouns more precisely
2. to experiment with uniting words which are likely to occur with
each other in certain groups. I mean, it is assumable that nouns
denoting some chemistry entries are more likely to be used along with
nouns denoting drinking or food than with nouns which denote some
cultural events or whatever. These groups could again be somehow
considered in the formula (1).
I also would like to know whether the author of the experiment will
deal with stone walls (compounds) consisting of more than 2 elements.
And how he/she will handle them.
Best Regards,
Vladimir
> Hi, I am really interested in this experiment proposal by Mr
> Amsler: "If we take the open compounds from a machine-readable
> dictionary and split out of them two lists of first words and second
> words, and then create a matrix with the first words as the x-axis
> and the second words as the y-axis and the individual cells as a 1
> or a 0 dependent upon whether that compound exists in the
> dictionary/language or not... What would a factor analysis of that
> very sparse binary matrix reveal? Could it indicate the existence of
> primitive properties shared by groups of words? (Say scalar traits
> for temperature words such as 'hot', 'cool', 'cold')." However, if
> the quest really is to find primitive semantic traits, shouldn't
> there be a multilingual perspective? In a multilingual view
> compounds are problematic because they vary both semantically and in
> their construction type and thus do not constitute a stable category
> of comparison ('ice cream' is a case in point). In my view, trying
> to find primitives by studying compounds is risky, because compounds
> seem to depend strongly on a language's morphology which should be
> considered arbitrary. Isn't it safer to study the collocational
> environment of lexical units as a whole instead of restricting
> efforts to noun compounds? One could, for example, construct a
> similar matrix for verbs that take 'ice cream' as object or
> adjectives that combine with the word. But I am not sure whether
> this is still the initial idea …? Anne-Kathrin Schumann PhD
> student University of Leipzig/University of Vienna
> ___________________________________________________________
> Empfehlen Sie WEB.DE DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.web.de
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
Mit freundlichen Grüßen
Wladimir Sidorenko
mailto:wlsidorenko at gmail.com
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list