8.1133, FYI: AltaVista Synomyn Search

linguist at linguistlist.org linguist at linguistlist.org
Tue Aug 5 01:42:56 UTC 1997


LINGUIST List:  Vol-8-1133. Mon Aug 4 1997. ISSN: 1068-4875.

Subject: 8.1133, FYI: AltaVista Synomyn Search

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Martin Jacobsen <marty at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 4 Aug 97 12:35:55 EST
From:  Jacques Guy <j.guy at trl.telstra.com.au>
Subject:  Computational linguistics: The AltaVista "refine" option

-------------------------------- Message 1 -------------------------------

Date:  Mon, 4 Aug 97 12:35:55 EST
From:  Jacques Guy <j.guy at trl.telstra.com.au>
Subject:  Computational linguistics: The AltaVista "refine" option

A colleague of mine told me about the "refine" option offered by the
AltaVista search engine (http://www.altavista.digital.com/) and how
good it was. In a nutshell, the "refine" options returns a list of
synonyms of, and notions related to the words in your query. Indeed,
it gave extremely sensible responses.

Perversely perhaps, I tried it in French, using "vin" (what else!)  as
the keyword. Bingo. This is what it returned:

72% Etait, etre, annees, meme, apres, etaient, derniers
59% Egalement, particulierement, differentes, possibilite
52% Qualite, vins, vin, vignoble, vigne, crus, vignes, vignerons,
vigneron [etc...]

Far from satisfactory. "Eau" and "pain" returned similar nonsense,
featuring "etaient", "etre", "egalement" et alia in prominent
positions. In fact, AltaVista "refine" seems decidedly adverse to
foodstuff in French, "fruit", "poisson" and "sandwich" failing equally
miserably (so did "sable", "mer", "lac").

So I was quite surprised when Italian queries about "wine" returned
sensible synonyms:

60% vino, vini, vigneti, uve
40% quantita, ettari, vitigni
39% sapore, profumo, invecchiamento
[etc.]

"Acqua" and "pane" fared equally well. So I turned my attention to
Spanish.  Spanish did quite as badly as French. This is quite puzzling
for the size of the Spanish data is quite large.

I don't know what inspired me, I decided to ask for an Italian
sandwich ("panino"). Bingo again!

60% perche, chissa, guardo, cazzo, sembrava, poiche, merda, riposto
    [yes, unbelievable but true]
54% mangiare, specialta, birra, mangia, roba, piatti, gusti, soldi,
bere
33% avevo, scusa, aveva, stavo, rispose, facevano
[etc.]

My colleague and I scratched our collective heads, experimented some
more, and came to the conclusion that the thesauri are built by a
neural net (she is heavily into neural nets). Still, the excellent
behaviour of the English thesaurus was suspect. But no, experimenting
demonstrated that it could not have been a hand-crafted thesaurus.
There are ways of "salting" a neural net and that is probably what
Digital did for English (and perhaps for Italian).

Do take a break and experiment a bit with AltaVista "refine" option in
your favourite languages (Polish was as nonsensical as French).  It is
quite amusing. And perhaps useful: next time someone knocks at your
door with a neural net for sale... (I have seen queries for "Kentucky
fried chicken" return "chicken sexers", "waste burners" and "singing
teachers", courtesy of a neural net).

j.guy at trl.telsta.com.au

---------------------------------------------------------------------------
LINGUIST List: Vol-8-1133



More information about the LINGUIST mailing list