[Corpora-List] corpus transformations info - SUMMARY
PbIKOB_B.B.
rykov at narod.ru
Tue May 20 06:58:57 UTC 2003
Philosophers teach us that there are three sources of knowledge (K): R (reality or nature), M (human mind) and S (sign structures √ here texts).
So √ corpus of texts (CT) is one of the sources of K.
In semiotical terms we want to extract and encode semantics from CT covering a certain knowledge domain (KD). IMO we meet here at least three problems:
1. It is hard to do it using just sign transformations √ without outer human or vocabulary/thesaurus support.
2. It is not a single stage process. Each stage has its own specifications.
3. The resulting K strongly depends on pragmatical goals of the user.
I think we should take into account discussing the problem of the CT into K formalisation.
Here are three papers discussing this problem:
=================
My colleague Scott Cederberg and I have worked pretty extensively on this
problem over the last couple of years. The following 3 papers give a good
overview:
Learning taxonomic information directly from corpora:
http://infomap.stanford.edu/papers/hyponymy.pdf
Building lexical classes from "seed examples":
http://infomap.stanford.edu/papers/lexical-graphs.ps
Enriching an existing taxonomy / lexicon with new terms:
http://infomap.stanford.edu/papers/enrich-taxonomies.pdf
These methods build on earlier work, particularly by Marti Hearst, Hinrich
Schutze, Ellen Riloff and Eugene Charniak.
Best wishes,
Dominic
===================================
> I would be grateful for any source of info (link, paper etc.) concerning the matter of transformation of corpus covering a knowledge domain or any specified subject into any knowledge structure like thesaurus, ontology, RDF file etc.
>
>
>--
>
> P bI K O B B.B. MOCKBA
>
>Vladimir Rykov, PhD in Computational Linguistics,
> MOSCOW
>http://rykov.narod.ru/
>Engl. http://www.blkbox.com/~gigawatt/rykov.html
>Tel +7-903-749-19-99
>
>--
>Чистая почта - это личные письма, без спама и вирусов - http://mail.yandex.ru/monitoring. Заведите и вы себе почту на Яндексе.
>
>
>
--
P bI K O B B.B. MOCKBA
Vladimir Rykov, PhD in Computational Linguistics,
MOSCOW
http://rykov.narod.ru/
Engl. http://www.blkbox.com/~gigawatt/rykov.html
Tel +7-903-749-19-99
--
Быстро и чисто - вот зачем нужна почта на Яндексе (http://mail.yandex.ru/monitoring/).
More information about the Corpora
mailing list