[Corpora-List] Help on SQL SELECT and UPDATE statements ? (fwd)

Listserv Administrator listman at listserv.linguistlist.org
Sat Aug 4 18:06:18 UTC 2007




---------- Forwarded message ----------
Date: Thu, 12 Jul 2007 23:30:01 +0200
From: Antonio del Saz <antonio.saz at otsistemas.com>
To: 'Mark Davies' <Mark_Davies at byu.edu>
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Help on SQL SELECT and UPDATE statements ?

----
Dear Mark,

I think the Mark Greenwood SQL sentence will not work, because
"partOfSpeech2" must be included in the GROUP BY clause. If only works if
sentence is reduced:

SELECT Max(freq) AS MaxFreq, word1, word2 FROM TagTable GROUP BY word1,
word2

Depending on your RMDBS, there will be more answers, but this SQL sentence
must work always:

SELECT T.freq, T.word1, T.word2, T.partOfSpeechWord2
FROM TagTable as T INNER JOIN (SELECT Max(freq) AS MaxFreq, word1, word2
FROM TagTable GROUP BY word1, word2) AS X ON X.MaxFreq=T.freq and
X.word1=T.word1 AND X.word2=T.word2

("T" and "X" are aliases to clarify)

And now, I will try some solutions to your update...

Best wishes,


Antonio del Saz
Socio Director
Móvil: +34 606 380 365

ORGANIZACIÓN Y TECNOLOGÍA DE SISTEMAS
JOSÉ SÁNCHEZ RUBIO, 1 Bl.1 2ºD
28250 - TORRELODONES (MADRID)
www.otsistemas.com



-----Mensaje original-----
De: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] En nombre de Mark
Davies
Enviado el: jueves, 12 de julio de 2007 20:29
Para: corpora at uib.no
Asunto: [Corpora-List] Help on SQL SELECT and UPDATE statements ?

Assume the following table with the frequency from a tagged corpus,
where the columns represent two-word strings (word1 and word2) and the
part of speech of the second word:

[TagTable]

freq	word1	word2	partOfSpeechWord2
----	----	----	----
1	the	plant	verb		(tagged incorrectly)
21	the	plant	noun
6	to	plant	verb
27	to	stop	verb
3	to	stop	noun		(probably tagged incorrectly)
4	the	stop	noun

I want to generate the following list, which shows the most frequent
partOfSpeech for a given word1 / word2 combination, e.g.:

21	the	plant	noun
6	to	plant	verb
27	to	stop	verb
4	the	stop	noun

but would not include the following, which are presumably errors from
the tagger:

1	the	plant	verb
3	to	stop	noun

What's the SELECT command to get the four correct results above?

----------------------------------------

A somewhat more complicated scenario --

Assume the same [TagTable] above, and then another table [NeedsTag] with
NULL partOfSpeech tags:

[NeedsTag]

word1	word2	partOfSpeechWord2
----	----	----
the	plant	NULL
to	plant	NULL
to	stop	NULL
the	stop	NULL

What would be the correct UPDATE command to insert noun, verb, verb,
noun (in that order) into this table, based on the most frequent tag
from the first table (i.e. ignoring the incorrect "the plant = V" and
"to stop = N")?

Thanks in advance for your help.

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================


_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list