[Corpora-List] Thesaurus recommendation.

Trevor Jenkins trevor.jenkins at suneidesis.com
Mon Apr 18 08:54:59 UTC 2011


On Mon, 18 Apr 2011, Hong-woo Chun <hongwoo.chun at gmail.com> wrote:

> I'm searching for thesauri.
> I would like to extract IS-A/PART-OF relations from the texts using BT-NT
> pairs in thesaurus. It's not depend upon any domain. Domain-Free!!!
> Currently, UMLS, Compendex have been manually categorized based on the
> corresponding relations.
> I'm trying to find out thesauri, but most of them are w.r.t. Biological or
> Biomedical domains.
> Are there good thesauri w.r.t. any Scientific domains?

There are a couple of thesauri compiled by the British Museum. Their
Object Names and Materials thesaurii are online at Collections Trust.
However, they are difficult to locate; CT's intra-site search feature is
borked. Check out http://www.collectionstrust.org.uk/bmobj/Objintro.html
and http://www.collectionstrust.org.uk/bmmat/matintro.html respectively.

The format of these micro-sites is a little bizarre. I once had to process
the content from the peculiar HTML used to a format suitable for inclusion
in a text retrieval system as a thesaurus. However, some perl/python/ruby
coding should extract the terms for you. (Can't give you the code I wrote
as it was written for my employer in their time for their client.)

> Please recommend good thesauri.

Because the Collections Trust web site search feature is broken you might
wish to to a site specific search in Google

"site:www.collectionstrust.org.uk thesarus"

which could give you upwards of 1,000 further links and thesaurii.

There is a Social History and Industrial Classification (SHIC) thesaurus
that was developed in the 1980s by curators from several other major UK
museums. I've only ever seen this in a printed edition never online. There
was some talk of an update SHIC-2 but the project may not have been
started.

There is also MeSH (Medical Subject Headings) from NIH. Again I had to
process this back in the days when it was provided on mag tape in
variable/fixed-length blocks in US/UK MARC format. I believe that it is
now available in XML format. Check out http://www.ncbi.nlm.nih.gov/mesh
for further details.

You might also wish to consult, if you have not already done so, ANSI/NISO
Z39.19 Guidelines for the Construction, Format and Management of
Monolingual Controlled Vocabulary. There used to be free-to-download
copies of this available at the NISO web site but it appears now to be a
``for purchase'' item. This standard used to be identical to ISO 2788:1986
and the various other national standards making bodies equivalent texts.
However, Z39.19 looks to have been updated in 2005 so the texts may have
diverged. ISO has a multi-lingual standard ISO 5964:1986.

Regards, Trevor

<>< Re: deemed!


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list