Corpora: re: future expressions in the BNC
Sebastian Hoffmann
sebhoff at es.unizh.ch
Tue Feb 22 12:57:16 UTC 2000
At 19:03 Uhr +0100 21.2.2000, Ute Römer wrote:
>Hi Corpus Linguists!
>
>I'm wondering whether one of you could possibly help me with a
>research project on future expressions in English. I'm looking for
>several structures in the spoken part of the British National Corpus
>and I have some problems to find types like "VERBing", "will be
>VERBing" and so on.
>Is there a possibility to find all present progressive forms without
>doing a separate query on every single verb, i. e. is it possible to
>insert some kind of "place marker" indicating "base form of lexical
>verb"?
>
>Thanks a lot for your help!
>
>Many Greetings from Cologne,
>Ute Römer
Hello Ute,
if you are using the Windows client for the BNC, this information
will be hard to get. A Perl-script running over a flat text version
of the spoken part of the BNC would be the most flexible approach for
me - but I'm sure there are more accessible ways of doing this. In
any case, a word of warning about the word-class tags for -ing forms
in the BNC. Here's a list of tags containing V(B|D|H|V)G:
+---------+-----------+
| tag | frequency |
+---------+-----------+
| VHG | 4687 |
| VBG | 6567 |
| AJ0_VVG | 8181 |
| VDG | 9584 |
| NN1_VVG | 16377 |
| VVG | 141230 |
+---------+-----------+
Of the total number of 186,626 occurrences, 24,558 (or about 13.1 per
cent) are in fact portmanteau tags (which is about 2.5 times more
than the average proportion of portmanteau tags in the whole BNC). In
addition, you'll miss some instances which are tagged as
nouns/adjectives only.
One possible option to approach the problem with good recall and
feasible precision would be to only look at let's say the 100 most
frequent verbs occurring in the -ing form. I'm appending a few
frequency lists that might be of help. I can provide further
frequency information if you need it.
This year, we are planning to release BNCweb, a web-based interface
to the BNC. Its feature-set goes beyond that of the Windows client in
that it allows searches on tags as well as lexical items (the initial
search, however, must still be lexical only). In other words, you
will be able to look for all instances of "will be" in the spoken
part and then restrict the result to only those sentences where "will
be" is followed by a word tagged as VBG, VDG, VHG, VVG (plus
portmanteau tags) - even with an optional intervening adverb if you
want to.
Hope this helps...
Best,
Sebastian
-----------------------------------
The following figures are based on the whole spoken part of the BNC.
The 100 most frequent words ending in -ing (regardless of word-class tag):
+---------------+-----------+
| word | frequency |
+---------------+-----------+
| going | 22617 |
| something | 13380 |
| thing | 11791 |
| doing | 9885 |
| being | 6779 |
| anything | 6522 |
| saying | 6070 |
| getting | 5590 |
| coming | 5548 |
| morning | 4761 |
| having | 4688 |
| talking | 4547 |
| looking | 4451 |
| nothing | 4176 |
| everything | 3724 |
| working | 3500 |
| trying | 3232 |
| making | 2387 |
| taking | 2337 |
| bring | 2246 |
| meeting | 2223 |
| fucking | 2187 |
| thinking | 2148 |
| training | 1744 |
| using | 1625 |
| putting | 1590 |
| during | 1581 |
| ring | 1560 |
| interesting | 1513 |
| building | 1467 |
| running | 1407 |
| sitting | 1314 |
| living | 1270 |
| planning | 1267 |
| giving | 1181 |
| asking | 1164 |
| playing | 1147 |
| telling | 1140 |
| evening | 1117 |
| moving | 1043 |
| beginning | 1027 |
| paying | 1007 |
| happening | 988 |
| writing | 988 |
| standing | 976 |
| feeling | 953 |
| housing | 925 |
| waiting | 886 |
| reading | 847 |
| walking | 820 |
| driving | 817 |
| listening | 749 |
| speaking | 739 |
| starting | 705 |
| following | 697 |
| seeing | 664 |
| watching | 647 |
| buying | 640 |
| shopping | 619 |
| selling | 615 |
| darling | 601 |
| recording | 582 |
| washing | 580 |
| changing | 550 |
| showing | 548 |
| dealing | 538 |
| existing | 538 |
| spending | 536 |
| keeping | 528 |
| teaching | 520 |
| eating | 510 |
| sing | 509 |
| including | 492 |
| bringing | 472 |
| leaving | 462 |
| providing | 441 |
| learning | 438 |
| advertising | 436 |
| cutting | 434 |
| king | 432 |
| growing | 425 |
| wedding | 406 |
| hoping | 404 |
| turning | 404 |
| boring | 400 |
| wearing | 392 |
| finding | 391 |
| understanding | 381 |
| opening | 379 |
| funding | 376 |
| helping | 375 |
| bearing | 362 |
| swimming | 362 |
| staying | 361 |
| knowing | 360 |
| carrying | 357 |
| setting | 357 |
| holding | 354 |
| picking | 351 |
| wanting | 343 |
+---------------+-----------+
The 100 most frequent word-tag combinations for words ending in -ing:
+-------------+---------+-----------+
| word | tag | frequency |
+-------------+---------+-----------+
| going | VVG | 22319 |
| something | PNI | 13380 |
| thing | NN1 | 11789 |
| doing | VDG | 9579 |
| being | VBG | 6567 |
| anything | PNI | 6522 |
| getting | VVG | 5590 |
| saying | VVG | 5365 |
| coming | VVG | 5086 |
| morning | NN1 | 4760 |
| having | VHG | 4687 |
| looking | VVG | 4301 |
| nothing | PNI | 4176 |
| talking | VVG | 4135 |
| everything | PNI | 3724 |
| trying | VVG | 3143 |
| working | VVG | 2785 |
| taking | VVG | 2283 |
| making | VVG | 2125 |
| thinking | VVG | 1899 |
| using | VVG | 1607 |
| during | PRP | 1581 |
| putting | VVG | 1571 |
| bring | VVI | 1532 |
| interesting | AJ0 | 1476 |
| meeting | NN1 | 1374 |
| giving | VVG | 1148 |
| sitting | VVG | 1127 |
| asking | VVG | 1090 |
| playing | VVG | 1084 |
| evening | NN1 | 1080 |
| fucking | AV0 | 1080 |
| telling | VVG | 1080 |
| building | NN1 | 1034 |
| running | VVG | 924 |
| paying | VVG | 883 |
| training | NN1 | 783 |
| training | NN1_VVG | 777 |
| walking | VVG | 731 |
| waiting | VVG | 720 |
| fucking | AJ0 | 717 |
| bring | VVB | 714 |
| beginning | NN1 | 694 |
| moving | VVG | 686 |
| listening | VVG | 666 |
| living | VVG | 659 |
| ring | VVI | 655 |
| saying | NN1_VVG | 651 |
| watching | VVG | 628 |
| speaking | VVG | 618 |
| meeting | NN1_VVG | 617 |
| starting | VVG | 613 |
| happening | VVG | 608 |
| planning | NN1 | 598 |
| darling | NN1 | 585 |
| seeing | VVG | 564 |
| selling | VVG | 551 |
| ring | NN1 | 543 |
| buying | VVG | 526 |
| writing | VVG | 503 |
| dealing | VVG | 501 |
| existing | AJ0 | 499 |
| showing | VVG | 486 |
| keeping | VVG | 481 |
| planning | NN1_VVG | 478 |
| housing | NN1 | 473 |
| bringing | VVG | 469 |
| feeling | NN1 | 442 |
| standing | VVG | 436 |
| including | PRP | 430 |
| leaving | VVG | 422 |
| housing | NN1_VVG | 410 |
| hoping | VVG | 402 |
| wedding | NN1 | 401 |
| working | AJ0 | 400 |
| reading | VVG | 397 |
| eating | VVG | 388 |
| standing | NN1_VVG | 376 |
| changing | VVG | 368 |
| writing | NN1_VVG | 354 |
| finding | VVG | 351 |
| carrying | VVG | 346 |
| knowing | VVG | 338 |
| wearing | VVG | 338 |
| happening | NN1_VVG | 336 |
| staying | VVG | 336 |
| driving | VVG | 335 |
| following | VVG | 335 |
| fucking | AJ0_AV0 | 333 |
| feeling | VVG | 330 |
| turning | VVG | 329 |
| wanting | VVG | 318 |
| spending | VVG | 317 |
| picking | VVG | 310 |
| cutting | VVG | 307 |
| doing | NN1 | 306 |
| boring | AJ0 | 305 |
| sing | VVI | 305 |
| coming | AJ0_VVG | 304 |
| recording | VVG | 301 |
+-------------+---------+-----------+
The 100 most frequent verbs ending in -ing tagged as gerund
(including portmanteau tags):
+------------+---------+-----------+
| word | tag | frequency |
+------------+---------+-----------+
| going | VVG | 22319 |
| doing | VDG | 9579 |
| being | VBG | 6567 |
| getting | VVG | 5590 |
| saying | VVG | 5365 |
| coming | VVG | 5086 |
| having | VHG | 4687 |
| looking | VVG | 4301 |
| talking | VVG | 4135 |
| trying | VVG | 3143 |
| working | VVG | 2785 |
| taking | VVG | 2283 |
| making | VVG | 2125 |
| thinking | VVG | 1899 |
| using | VVG | 1607 |
| putting | VVG | 1571 |
| giving | VVG | 1148 |
| sitting | VVG | 1127 |
| asking | VVG | 1090 |
| playing | VVG | 1084 |
| telling | VVG | 1080 |
| running | VVG | 924 |
| paying | VVG | 883 |
| training | NN1_VVG | 777 |
| walking | VVG | 731 |
| waiting | VVG | 720 |
| moving | VVG | 686 |
| listening | VVG | 666 |
| living | VVG | 659 |
| saying | NN1_VVG | 651 |
| watching | VVG | 628 |
| speaking | VVG | 618 |
| meeting | NN1_VVG | 617 |
| starting | VVG | 613 |
| happening | VVG | 608 |
| seeing | VVG | 564 |
| selling | VVG | 551 |
| buying | VVG | 526 |
| writing | VVG | 503 |
| dealing | VVG | 501 |
| showing | VVG | 486 |
| keeping | VVG | 481 |
| planning | NN1_VVG | 478 |
| bringing | VVG | 469 |
| standing | VVG | 436 |
| leaving | VVG | 422 |
| housing | NN1_VVG | 410 |
| hoping | VVG | 402 |
| reading | VVG | 397 |
| eating | VVG | 388 |
| standing | NN1_VVG | 376 |
| changing | VVG | 368 |
| writing | NN1_VVG | 354 |
| finding | VVG | 351 |
| carrying | VVG | 346 |
| knowing | VVG | 338 |
| wearing | VVG | 338 |
| happening | NN1_VVG | 336 |
| staying | VVG | 336 |
| driving | VVG | 335 |
| following | VVG | 335 |
| feeling | VVG | 330 |
| turning | VVG | 329 |
| wanting | VVG | 318 |
| spending | VVG | 317 |
| picking | VVG | 310 |
| cutting | VVG | 307 |
| coming | AJ0_VVG | 304 |
| recording | VVG | 301 |
| learning | VVG | 298 |
| wondering | VVG | 296 |
| running | AJ0_VVG | 285 |
| building | NN1_VVG | 277 |
| moving | AJ0_VVG | 269 |
| providing | VVG | 268 |
| lying | VVG | 267 |
| increasing | VVG | 264 |
| holding | VVG | 258 |
| suggesting | VVG | 254 |
| laughing | VVG | 248 |
| sending | VVG | 244 |
| talking | NN1_VVG | 242 |
| falling | VVG | 234 |
| singing | VVG | 234 |
| becoming | VVG | 233 |
| calling | VVG | 232 |
| making | NN1_VVG | 232 |
| meeting | VVG | 232 |
| beginning | VVG | 228 |
| helping | VVG | 228 |
| pulling | VVG | 227 |
| working | AJ0_VVG | 225 |
| teaching | NN1_VVG | 224 |
| setting | VVG | 223 |
| losing | VVG | 219 |
| growing | VVG | 218 |
| expecting | VVG | 216 |
| pushing | VVG | 214 |
| reading | NN1_VVG | 214 |
| hanging | VVG | 207 |
+------------+---------+-----------+
The 100 most frequent words ending in -ing which are *not* tagged as gerund.
+---------------+---------+-----------+
| word | tag | frequency |
+---------------+---------+-----------+
| something | PNI | 13380 |
| thing | NN1 | 11789 |
| anything | PNI | 6522 |
| morning | NN1 | 4760 |
| nothing | PNI | 4176 |
| everything | PNI | 3724 |
| during | PRP | 1581 |
| bring | VVI | 1532 |
| interesting | AJ0 | 1476 |
| meeting | NN1 | 1374 |
| evening | NN1 | 1080 |
| fucking | AV0 | 1080 |
| building | NN1 | 1034 |
| training | NN1 | 783 |
| fucking | AJ0 | 717 |
| bring | VVB | 714 |
| beginning | NN1 | 694 |
| ring | VVI | 655 |
| planning | NN1 | 598 |
| darling | NN1 | 585 |
| ring | NN1 | 543 |
| existing | AJ0 | 499 |
| housing | NN1 | 473 |
| feeling | NN1 | 442 |
| including | PRP | 430 |
| wedding | NN1 | 401 |
| working | AJ0 | 400 |
| fucking | AJ0_AV0 | 333 |
| doing | NN1 | 306 |
| boring | AJ0 | 305 |
| sing | VVI | 305 |
| amazing | AJ0 | 290 |
| following | AJ0 | 287 |
| willing | AJ0 | 281 |
| shopping | NN1 | 277 |
| exciting | AJ0 | 265 |
| advertising | NN1 | 264 |
| king | NN1 | 256 |
| washing | NN1 | 241 |
| disgusting | AJ0 | 240 |
| engineering | NN1 | 226 |
| heating | NN1 | 216 |
| spring | NN1 | 213 |
| pudding | NN1 | 212 |
| understanding | NN1 | 211 |
| being | NN1 | 210 |
| sing | VVB | 203 |
| ring | VVB | 200 |
| swimming | NN1 | 190 |
| reading | NN1 | 178 |
| providing | CJS | 172 |
| marketing | NN1 | 166 |
| ring | NN1_VVB | 160 |
| hunting | NN1 | 158 |
| living | AJ0 | 158 |
| fishing | NN1 | 157 |
| ceiling | NN1 | 148 |
| surprising | AJ0 | 148 |
| offspring | NN0 | 146 |
| teaching | NN1 | 145 |
| outstanding | AJ0 | 137 |
| wing | NN1 | 127 |
| writing | NN1 | 127 |
| string | NN1 | 125 |
| dining | NN1 | 124 |
| driving | AJ0 | 124 |
| leading | AJ0 | 124 |
| regarding | PRP | 123 |
| wording | NN1 | 122 |
| bearing | NN1 | 119 |
| meaning | NN1 | 119 |
| freezing | AJ0 | 110 |
| opening | NN1 | 110 |
| living | NN1 | 107 |
| living | AJ0_NN1 | 106 |
| blooming | AJ0 | 105 |
| embarrassing | AJ0 | 105 |
| growing | AJ0 | 105 |
| king | NP0 | 105 |
| painting | NN1 | 104 |
| warning | NN1 | 104 |
| boxing | NN1 | 103 |
| encouraging | AJ0 | 102 |
| flipping | AJ0 | 98 |
| drawing | NN1 | 96 |
| running | AJ0 | 96 |
| parking | NN1 | 95 |
| standing | NN1 | 95 |
| lighting | NN1 | 93 |
| moving | AJ0 | 87 |
| annoying | AJ0 | 86 |
| recording | NN1 | 84 |
| appalling | AJ0 | 83 |
| heading | NN1 | 82 |
| fascinating | AJ0 | 81 |
| shilling | NN1 | 81 |
| spelling | NN1 | 80 |
| bleeding | AJ0 | 78 |
| cooking | NN1 | 78 |
| clothing | NN1 | 77 |
+---------------+---------+-----------+
-------------------------------------------------------------
/ Sebastian Hoffmann | University of Zurich \
| e-Mail: sebhoff at es.unizh.ch | English Department |
| Plattenstrasse 47 | CH-8032 Zurich/Switzerland |
\ Phone: (41 1) 634 35 51 | Fax: (41 1) 634 49 08 /
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 18844 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20000222/9d669b4a/attachment-0001.bin>
More information about the Corpora
mailing list