[Corpora-List] corpus ------>>>>> thesaurus

Paul Buitelaar paulb at dfki.de
Tue Nov 16 11:54:00 UTC 2004


Dear Vladimir and all,

The acquisition of a domain thesaurus from a domain-specific corpus (or
more in general: text collection) is very much related to current work
on ontology learning/extraction from text.

An ontology (as used currently in the Semantic Web context and based on
previous incarnations in the context of expert systems and similar)
represents 'a set of concepts and relations between these concepts that
are relevant to a particular domain of discourse'. Similarly, a
thesaurus for a particular domain represents a set of terms and a
selected set of relations between these terms (e.g. 'broader term',
'narrower term') -- but notice the difference in 'term' vs. 'concept'.
There is currently much discussion on the status of thesauri in the
Semantic Web context, e.g. follow developments on SKOS ('an RDF
vocabulary for describing thesauri, glossaries, taxonomies,
terminologies'): http://www.w3.org/2004/02/skos/

As mentioned, there is currently much related work to your question in
the context of ontology learning/extraction from text. For an overview
of some recent papers, check out the recent ECAI 2004 workshop on
"Ontology Learning and Population" at:

http://olp.dfki.de/ecai04/cfp.htm   -- all papers and most presentations
can be downloaded

The workshop description also has some further links to previous,
related workshops.

Hope this helps,


    Paul Buitelaar
    DFKI - Language Technology &
    Competence Center Semantic Web
    Saarbruecken, Germany

    http://www.dfki.de/~paulb/


>>    I would be very grateful to anyone for any info concerning
>>
>>
>compiling thesaurus from corpus (esp. from corpus of specific domain
>documents).
>
>
>>    As example - thesaurus of financial terms compiled from financial
>>
>>
>documents corpus.
>
>
>>      Best wishes to all our corpus society !
>>
>>--
>>  Regards Vladimir Rykov
>>
>>PhD in Computational Linguistics
>>Personal web-site: rykov.narod.ru
>>mailto: rykov2000 at mail.ru
>>Si etiam omnes - ego non
>>English version:   www.blkbox.com/~gigawatt/rykov.html
>>
>>--
>>ñÎÄÅËÓ.éÇÒÕÛËÉ - ÑÒËÉÊ ÐÅÒÅÒÙ× × ÓÅÒÙÈ ÔÒÕÄÏ×ÙÈ ÂÕÄÎÑÈ. http://play.yandex.ru/
>>
>>
>>
>>
>>
>
>
>
>
>
>
>
>
>
>



More information about the Corpora mailing list