[Corpora-List] corpus transformations info - SUMMARY

PbIKOB_B.B. rykov at narod.ru
Tue May 20 06:58:57 UTC 2003


Philosophers teach us that there are three sources of knowledge (K): R (reality or nature), M (human mind) and S (sign structures √ here texts).

So √ corpus of texts (CT) is one of the sources of K.

 In semiotical terms we want to extract and encode semantics from CT covering a certain knowledge domain (KD). IMO we meet here at least three problems:

1.	It is hard to do it using just sign transformations √ without outer human or vocabulary/thesaurus support.
2.	It is not a single stage process. Each stage has its own specifications.
3.	The resulting K strongly depends on pragmatical goals of the user.

I think we should take into account discussing the problem of the CT into K formalisation.

Here are three papers discussing this problem:

=================

My colleague Scott Cederberg and I have worked pretty extensively on this
problem over the last couple of years. The following 3 papers give a good
overview:

Learning taxonomic information directly from corpora:
http://infomap.stanford.edu/papers/hyponymy.pdf

Building lexical classes from "seed examples":
http://infomap.stanford.edu/papers/lexical-graphs.ps

Enriching an existing taxonomy / lexicon with new terms:
http://infomap.stanford.edu/papers/enrich-taxonomies.pdf

These methods build on earlier work, particularly by Marti Hearst, Hinrich
Schutze, Ellen Riloff and Eugene Charniak.

Best wishes,
Dominic

===================================


>   I would be grateful for any source of info (link, paper etc.) concerning the matter of transformation of corpus covering a knowledge domain or any specified subject into any knowledge structure like thesaurus, ontology, RDF file etc.
>
>
>--
>
>    P bI K O B  B.B.   MOCKBA
>
>Vladimir Rykov, PhD in Computational Linguistics,
> MOSCOW
>http://rykov.narod.ru/
>Engl. http://www.blkbox.com/~gigawatt/rykov.html
>Tel +7-903-749-19-99
>
>--
>Чистая почта - это личные письма, без спама и вирусов - http://mail.yandex.ru/monitoring. Заведите и вы себе почту на Яндексе.
>
>
>


--

    P bI K O B  B.B.   MOCKBA

Vladimir Rykov, PhD in Computational Linguistics,
 MOSCOW
http://rykov.narod.ru/
Engl. http://www.blkbox.com/~gigawatt/rykov.html
Tel +7-903-749-19-99

--
Быстро и чисто - вот зачем нужна почта на Яндексе (http://mail.yandex.ru/monitoring/).



More information about the Corpora mailing list