[Corpora-List] Help on SQL SELECT and UPDATE statements ? (fwd)

Listserv Administrator listman at listserv.linguistlist.org
Sat Aug 4 18:05:37 UTC 2007


---------- Forwarded message ----------
Date: Thu, 12 Jul 2007 21:35:56 +0200
From: "[windows-1252] Pawe? Nowak" <panowak at willow.iie.uz.zgora.pl>
To: Mark Davies <Mark_Davies at byu.edu>
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Help on SQL SELECT and UPDATE statements ?

Dear Mark,

Here are my answers to your questions. Please be aware that these are
'prompt answers' so the queries may not be the most
sophisticated/optimized ones. I also do not know what database
management system you use. I presume it is MySQL but at present I have
access to PostgreSQL only. I've tested the queries with the DBMS and
they work fine with your examples.

Q1:
Get the most frequent POS for all combinations:
select max(freq), word1, word2, partOfSpeechWord2 from TagTable group by
word1, word2, partOfSpeechWord2 having (((partOfSpeechWord2 = 'verb')
and (word1 = 'to')) or ((partOfSpeechWord2 = 'noun') and (word1 = 'the')));

Get the most frequent POS for a given combination (YYY XXX):
select max(freq), word1, word2, partOfSpeechWord2 from TagTable group by
word1, word2, partOfSpeechWord2 having ((((partOfSpeechWord2 = 'verb')
and (word1 = 'to')) or ((partOfSpeechWord2 = 'noun') and (word1 =
'the'))) and (word1 = 'YYY') and (word2 = 'XXX'));

Q2:
The query is (MySQL 4.1 or newer required):
update NeedsTag set partOfSpeechWord2 = (select partOfSpeechWord2 from
tagtable where word1 = 'YYY' and word2 = 'XXX' order by freq desc limit
1) where word1 = 'YYY' and word2 = 'XXX';

Please note that this query updates only one word combination at a time.
You will need to execute it for each and every row in the NeedsTag
table. I think it would be best to use a stored procedure that would
iterate over all rows in the NeedsTag table. This is, however,
impossible with MySQL 4.x since it does not support procedures.

I hope these answers are of any use for you. Should you have any doubts,
do not hesitate to ask further questions.

Best regards,

Pawel Nowak


> Assume the following table with the frequency from a tagged corpus,
> where the columns represent two-word strings (word1 and word2) and the
> part of speech of the second word:
>
> [TagTable]
>
> freq	word1	word2	partOfSpeechWord2
> ----	----	----	----
> 1	the	plant	verb		(tagged incorrectly)
> 21	the	plant	noun
> 6	to	plant	verb
> 27	to	stop	verb
> 3	to	stop	noun		(probably tagged incorrectly)
> 4	the	stop	noun
>
> I want to generate the following list, which shows the most frequent
> partOfSpeech for a given word1 / word2 combination, e.g.:
>
> 21	the	plant	noun
> 6	to	plant	verb
> 27	to	stop	verb
> 4	the	stop	noun
>
> but would not include the following, which are presumably errors from
> the tagger:
>
> 1	the	plant	verb
> 3	to	stop	noun
>
> What's the SELECT command to get the four correct results above?
>
> ----------------------------------------
>
> A somewhat more complicated scenario --
>
> Assume the same [TagTable] above, and then another table [NeedsTag] with
> NULL partOfSpeech tags:
>
> [NeedsTag]
>
> word1	word2	partOfSpeechWord2
> ----	----	----
> the	plant	NULL
> to	plant	NULL
> to	stop	NULL
> the	stop	NULL
>
> What would be the correct UPDATE command to insert noun, verb, verb,
> noun (in that order) into this table, based on the most frequent tag
> from the first table (i.e. ignoring the incorrect "the plant = V" and
> "to stop = N")?
>
> Thanks in advance for your help.
>
> Mark Davies
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
>
> _______________________________________________
> corpora mailing list
> corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list