[Corpora-List] corpus transformations info - SUMMARY
    PbIKOB_B.B. 
    rykov at narod.ru
       
    Tue May 20 06:58:57 UTC 2003
    
    
  
Philosophers teach us that there are three sources of knowledge (K): R (reality or nature), M (human mind) and S (sign structures √ here texts).
So √ corpus of texts (CT) is one of the sources of K.
 In semiotical terms we want to extract and encode semantics from CT covering a certain knowledge domain (KD). IMO we meet here at least three problems:
1.	It is hard to do it using just sign transformations √ without outer human or vocabulary/thesaurus support.
2.	It is not a single stage process. Each stage has its own specifications.
3.	The resulting K strongly depends on pragmatical goals of the user.
I think we should take into account discussing the problem of the CT into K formalisation.
Here are three papers discussing this problem:
=================
My colleague Scott Cederberg and I have worked pretty extensively on this
problem over the last couple of years. The following 3 papers give a good
overview:
Learning taxonomic information directly from corpora:
http://infomap.stanford.edu/papers/hyponymy.pdf
Building lexical classes from "seed examples":
http://infomap.stanford.edu/papers/lexical-graphs.ps
Enriching an existing taxonomy / lexicon with new terms:
http://infomap.stanford.edu/papers/enrich-taxonomies.pdf
These methods build on earlier work, particularly by Marti Hearst, Hinrich
Schutze, Ellen Riloff and Eugene Charniak.
Best wishes,
Dominic
===================================
>   I would be grateful for any source of info (link, paper etc.) concerning the matter of transformation of corpus covering a knowledge domain or any specified subject into any knowledge structure like thesaurus, ontology, RDF file etc.
>
>
>--
>
>    P bI K O B  B.B.   MOCKBA
>
>Vladimir Rykov, PhD in Computational Linguistics,
> MOSCOW
>http://rykov.narod.ru/
>Engl. http://www.blkbox.com/~gigawatt/rykov.html
>Tel +7-903-749-19-99
>
>--
>Чистая почта - это личные письма, без спама и вирусов - http://mail.yandex.ru/monitoring. Заведите и вы себе почту на Яндексе.
>
>
>
--
    P bI K O B  B.B.   MOCKBA
Vladimir Rykov, PhD in Computational Linguistics,
 MOSCOW
http://rykov.narod.ru/
Engl. http://www.blkbox.com/~gigawatt/rykov.html
Tel +7-903-749-19-99
--
Быстро и чисто - вот зачем нужна почта на Яндексе (http://mail.yandex.ru/monitoring/).
    
    
More information about the Corpora
mailing list