[Corpora-List] Help on SQL SELECT and UPDATE statements ? (fwd)
Listserv Administrator
listman at listserv.linguistlist.org
Sat Aug 4 18:06:10 UTC 2007
---------- Forwarded message ----------
Date: Thu, 12 Jul 2007 21:12:26 +0100
From: Mark Greenwood <mark at dcs.shef.ac.uk>
To: 'Mark Davies' <Mark_Davies at byu.edu>, corpora at uib.no
Subject: Re: [Corpora-List] Help on SQL SELECT and UPDATE statements ?
Assuming the table structure and data below I've just tried and the
following query produces the result you want for the first question:
select max(freq) as freq, word1, word2, partOfSpeechWord2 from TagTable
group by word1, word2;
It even renames the first column back to freq to match the original table.
Not sure about the second question yet,
Mark
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Mark Davies
Sent: 12 July 2007 19:29
To: corpora at uib.no
Subject: [Corpora-List] Help on SQL SELECT and UPDATE statements ?
Assume the following table with the frequency from a tagged corpus, where
the columns represent two-word strings (word1 and word2) and the part of
speech of the second word:
[TagTable]
freq word1 word2 partOfSpeechWord2
---- ---- ---- ----
1 the plant verb (tagged incorrectly)
21 the plant noun
6 to plant verb
27 to stop verb
3 to stop noun (probably tagged incorrectly)
4 the stop noun
I want to generate the following list, which shows the most frequent
partOfSpeech for a given word1 / word2 combination, e.g.:
21 the plant noun
6 to plant verb
27 to stop verb
4 the stop noun
but would not include the following, which are presumably errors from the
tagger:
1 the plant verb
3 to stop noun
What's the SELECT command to get the four correct results above?
----------------------------------------
A somewhat more complicated scenario --
Assume the same [TagTable] above, and then another table [NeedsTag] with
NULL partOfSpeech tags:
[NeedsTag]
word1 word2 partOfSpeechWord2
---- ---- ----
the plant NULL
to plant NULL
to stop NULL
the stop NULL
What would be the correct UPDATE command to insert noun, verb, verb, noun
(in that order) into this table, based on the most frequent tag from the
first table (i.e. ignoring the incorrect "the plant = V" and "to stop = N")?
Thanks in advance for your help.
Mark Davies
============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora
_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list