[Corpora-List] Syntax search problem resolved
Sebastian Hoffmann
sebhoff at es.unizh.ch
Fri Jun 16 16:03:48 UTC 2006
At 6:24 AM -0700 6/16/06, Linda Bawcom wrote:
>Dear friends, collegues, and list members,
>
>Thanks to Knut Hofland, Geoffrey Williams, Chris
>Tribble and Mark Davis, who all very kindly took
>me by the hand, I was able to find the strings I
>needed by using, WITH the BNC:
>
><w NP0>* <w PRF>of , although I was unable to
>get strings with <w NP0>* <w PRF>of <w NP0>*
>Well, It all seems quite obvious and logical now
>of course!
>
> And since nouns follow of then it's just a
>matter of deleting items such as United States
>of America (no pun intended) or Port of Spain.
>I'm not quite sure whether to include items such
>as Joan of Arc, Lawrence of Arabia or Prince of
>Wales when basically I'm looking for frequency
>of i.e. Clinton of Little Rock. I suppose I'll
>check with John Sinclair-the 'of' expert!
>
>Kindest regards,
>Linda
>
Dear Linda,
I just ran a query for "NP0 of NP0" in BNCweb
(CQP edition) and got 7850 hits. The frequency
list feature gives you the following top 50
combinations:
No. Lexical item(s) No. of occurrences Percent
1 Isle of Man 346 4.41%
2 Isle of Wight 342 4.36%
3 States of America 168 2.14%
4 End of London 97 1.24%
5 Donaldson of Lymington 73 0.93%
6 Isle of Dogs 55 0.7%
7 Bridge of Harwich 50 0.64%
8 Riding of Yorkshire 46 0.59%
9 Jesus of Nazareth 44 0.56%
10 John of Gaunt 43 0.55%
11 Mitterrand of France 38 0.48%
12 Joan of Arc 35 0.45%
13 Goff of Chieveley 34 0.43%
14 Keith of Kinkel 32 0.41%
15 William of Malmesbury 29 0.37%
16 Francis of Assisi 29 0.37%
17 HUSSEIN of Jordan 27 0.34%
18 Lawrence of Arabia 27 0.34%
19 Richard of Gloucester 26 0.33%
20 States of Europe 26 0.33%
21 Highlands of Scotland 26 0.33%
22 Port of Spain 26 0.33%
23 Slynn of Hadley 24 0.31%
24 Kingdom of Great 23 0.29%
25 Isle of Skye 21 0.27%
26 Isle of Lewis 20 0.25%
27 John of Salisbury 19 0.24%
28 Joseph of Arimathea 18 0.23%
29 Edward of England 18 0.23%
30 Michael of Kent 18 0.23%
31 Hassan of Morocco 18 0.23%
32 Julian of Norwich 18 0.23%
33 HUGH OF LINCOLN 18 0.23%
34 Florence of Worcester 18 0.23%
35 Philip of Spain 15 0.19%
36 Isle of Sheppey 15 0.19%
37 Eleanor of Aquitaine 14 0.18%
38 Fahd of Saudi 14 0.18%
39 Mubarak of Egypt 13 0.17%
40 John of God 13 0.17%
41 Philip of France 12 0.15%
42 Teresa of Avila 12 0.15%
43 Hugh of Lyons 12 0.15%
44 Hook of Holland 12 0.15%
45 Fraser of Carmyllie 12 0.15%
46 William of Jumièges 11 0.14%
47 Henry of Lancaster 11 0.14%
48 Brandon of Oakbrook 11 0.14%
49 Morris of Borth-y-Gest 11 0.14%
50 Isle of Innisfree 11 0.14%
I can send you the complete list if you want. It
may also be useful to add a few optional elements
to your retrieval pattern. For example, you could
allow sequences of items tagged as NP0 as well as
instances of NN1 and NN2 that immediately follow
the second NP0 to get instances like the
following:
<w NP0>Superintendent <w NP0>Trobridge <w PRF>of
<w NP0>Ealing <w NN2>Police <w NN1>Station
<w NP0>St <w NP0>Francis <w PRF>of <w NP0>Assisi
<w NP0>Archbishop <w NP0>MacNamara <w PRF>of <w NP0>Dublin
Best,
Sebastian
--
Dr. Sebastian Hoffmann
Englisches Seminar der Univ. Zürich
Plattenstrasse 47
CH-8032 Zürich
Tel: +41-44-634 3551
Fax: +41-44-634 4908
http://www-es.unizh.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060616/ca854bab/attachment.htm>
More information about the Corpora
mailing list