[Corpora-List] Help on SQL SELECT and UPDATE statements ? (fwd)

Listserv Administrator listman at listserv.linguistlist.org
Sat Aug 4 18:06:38 UTC 2007


---------- Forwarded message ----------
Date: Thu, 12 Jul 2007 17:36:55 -0400
From: Panagiotis G. Ipeirotis <panos at stern.nyu.edu>
To: Mark Greenwood <mark at dcs.shef.ac.uk>
Cc: corpora at uib.no, 'Mark Davies' <Mark_Davies at byu.edu>
Subject: Re: [Corpora-List] Help on SQL SELECT and UPDATE statements ?

Try this:

SELECT word1, word2, partOfSpeechWord2, freq
FROM TagTable AS T, (SELECT MAX(freq) as freq, word1, word2
FROM TagTable
GROUP BY word1, word2) AS A
WHERE A.word1=T.word2 AND A.word2=T.word2 AND T.freq=A.freq

Let me know if it works or not.

Cheers,
Panos

Mark Greenwood wrote:
> Assuming the table structure and data below I've just tried and the
> following query produces the result you want for the first question:
>
> select max(freq) as freq, word1, word2, partOfSpeechWord2 from TagTable
> group by word1, word2;
>
> It even renames the first column back to freq to match the original table.
>
> Not sure about the second question yet,
>
> Mark
>
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Mark Davies
> Sent: 12 July 2007 19:29
> To: corpora at uib.no
> Subject: [Corpora-List] Help on SQL SELECT and UPDATE statements ?
>
> Assume the following table with the frequency from a tagged corpus, where
> the columns represent two-word strings (word1 and word2) and the part of
> speech of the second word:
>
> [TagTable]
>
> freq	word1	word2	partOfSpeechWord2
> ----	----	----	----
> 1	the	plant	verb		(tagged incorrectly)
> 21	the	plant	noun
> 6	to	plant	verb
> 27	to	stop	verb
> 3	to	stop	noun		(probably tagged incorrectly)
> 4	the	stop	noun
>
> I want to generate the following list, which shows the most frequent
> partOfSpeech for a given word1 / word2 combination, e.g.:
>
> 21	the	plant	noun
> 6	to	plant	verb
> 27	to	stop	verb
> 4	the	stop	noun
>
> but would not include the following, which are presumably errors from the
> tagger:
>
> 1	the	plant	verb
> 3	to	stop	noun
>
> What's the SELECT command to get the four correct results above?
>
> ----------------------------------------
>
> A somewhat more complicated scenario --
>
> Assume the same [TagTable] above, and then another table [NeedsTag] with
> NULL partOfSpeech tags:
>
> [NeedsTag]
>
> word1	word2	partOfSpeechWord2
> ----	----	----
> the	plant	NULL
> to	plant	NULL
> to	stop	NULL
> the	stop	NULL
>
> What would be the correct UPDATE command to insert noun, verb, verb, noun
> (in that order) into this table, based on the most frequent tag from the
> first table (i.e. ignoring the incorrect "the plant = V" and "to stop = N")?
>
> Thanks in advance for your help.
>
> Mark Davies
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
>
> _______________________________________________
> corpora mailing list
> corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> corpora mailing list
> corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list