Corpora: re: future expressions in the BNC

Sebastian Hoffmann sebhoff at es.unizh.ch
Tue Feb 22 12:57:16 UTC 2000


At 19:03 Uhr +0100 21.2.2000, Ute Römer wrote:
>Hi Corpus Linguists!
>
>I'm wondering whether one of you could possibly help me with a 
>research project on future expressions in English. I'm looking for 
>several structures in the spoken part of the British National Corpus 
>and I have some problems to find types like "VERBing", "will be 
>VERBing" and so on.
>Is there a possibility to find all present progressive forms without 
>doing a separate query on every single verb, i. e. is it possible to 
>insert some kind of "place marker" indicating "base form of lexical 
>verb"?
>
>Thanks a lot for your help!
>
>Many Greetings from Cologne,
>Ute Römer

Hello Ute,
if you are using the Windows client for the BNC, this information 
will be hard to get. A Perl-script running over a flat text version 
of the spoken part of the BNC would be the most flexible approach for 
me - but I'm sure there are more accessible ways of doing this. In 
any case, a word of warning about the word-class tags for -ing forms 
in the BNC. Here's a list of tags containing V(B|D|H|V)G:

+---------+-----------+
| tag     | frequency |
+---------+-----------+
| VHG     |      4687 |
| VBG     |      6567 |
| AJ0_VVG |      8181 |
| VDG     |      9584 |
| NN1_VVG |     16377 |
| VVG     |    141230 |
+---------+-----------+

Of the total number of 186,626 occurrences, 24,558 (or about 13.1 per 
cent) are in fact portmanteau tags (which is about 2.5 times more 
than the average proportion of portmanteau tags in the whole BNC). In 
addition, you'll miss some instances which are tagged as 
nouns/adjectives only.

One possible option to approach the problem with good recall and 
feasible precision would be to only look at let's say the 100 most 
frequent verbs occurring in the -ing form. I'm appending a few 
frequency lists that might be of help. I can provide further 
frequency information if you need it.

This year, we are planning to release BNCweb, a web-based interface 
to the BNC. Its feature-set goes beyond that of the Windows client in 
that it allows searches on tags as well as lexical items (the initial 
search, however, must still be lexical only). In other words, you 
will be able to look for all instances of "will be" in the spoken 
part and then restrict the result to only those sentences where "will 
be" is followed by a word tagged as VBG, VDG, VHG, VVG (plus 
portmanteau tags) - even with an optional intervening adverb if you 
want to.

Hope this helps...
Best,
Sebastian

-----------------------------------
The following figures are based on the whole spoken part of the BNC.

The 100 most frequent words ending in -ing (regardless of word-class tag):
+---------------+-----------+
| word          | frequency |
+---------------+-----------+
| going         |     22617 |
| something     |     13380 |
| thing         |     11791 |
| doing         |      9885 |
| being         |      6779 |
| anything      |      6522 |
| saying        |      6070 |
| getting       |      5590 |
| coming        |      5548 |
| morning       |      4761 |
| having        |      4688 |
| talking       |      4547 |
| looking       |      4451 |
| nothing       |      4176 |
| everything    |      3724 |
| working       |      3500 |
| trying        |      3232 |
| making        |      2387 |
| taking        |      2337 |
| bring         |      2246 |
| meeting       |      2223 |
| fucking       |      2187 |
| thinking      |      2148 |
| training      |      1744 |
| using         |      1625 |
| putting       |      1590 |
| during        |      1581 |
| ring          |      1560 |
| interesting   |      1513 |
| building      |      1467 |
| running       |      1407 |
| sitting       |      1314 |
| living        |      1270 |
| planning      |      1267 |
| giving        |      1181 |
| asking        |      1164 |
| playing       |      1147 |
| telling       |      1140 |
| evening       |      1117 |
| moving        |      1043 |
| beginning     |      1027 |
| paying        |      1007 |
| happening     |       988 |
| writing       |       988 |
| standing      |       976 |
| feeling       |       953 |
| housing       |       925 |
| waiting       |       886 |
| reading       |       847 |
| walking       |       820 |
| driving       |       817 |
| listening     |       749 |
| speaking      |       739 |
| starting      |       705 |
| following     |       697 |
| seeing        |       664 |
| watching      |       647 |
| buying        |       640 |
| shopping      |       619 |
| selling       |       615 |
| darling       |       601 |
| recording     |       582 |
| washing       |       580 |
| changing      |       550 |
| showing       |       548 |
| dealing       |       538 |
| existing      |       538 |
| spending      |       536 |
| keeping       |       528 |
| teaching      |       520 |
| eating        |       510 |
| sing          |       509 |
| including     |       492 |
| bringing      |       472 |
| leaving       |       462 |
| providing     |       441 |
| learning      |       438 |
| advertising   |       436 |
| cutting       |       434 |
| king          |       432 |
| growing       |       425 |
| wedding       |       406 |
| hoping        |       404 |
| turning       |       404 |
| boring        |       400 |
| wearing       |       392 |
| finding       |       391 |
| understanding |       381 |
| opening       |       379 |
| funding       |       376 |
| helping       |       375 |
| bearing       |       362 |
| swimming      |       362 |
| staying       |       361 |
| knowing       |       360 |
| carrying      |       357 |
| setting       |       357 |
| holding       |       354 |
| picking       |       351 |
| wanting       |       343 |
+---------------+-----------+

The 100 most frequent word-tag combinations for words ending in -ing:
+-------------+---------+-----------+
| word        | tag     | frequency |
+-------------+---------+-----------+
| going       | VVG     |     22319 |
| something   | PNI     |     13380 |
| thing       | NN1     |     11789 |
| doing       | VDG     |      9579 |
| being       | VBG     |      6567 |
| anything    | PNI     |      6522 |
| getting     | VVG     |      5590 |
| saying      | VVG     |      5365 |
| coming      | VVG     |      5086 |
| morning     | NN1     |      4760 |
| having      | VHG     |      4687 |
| looking     | VVG     |      4301 |
| nothing     | PNI     |      4176 |
| talking     | VVG     |      4135 |
| everything  | PNI     |      3724 |
| trying      | VVG     |      3143 |
| working     | VVG     |      2785 |
| taking      | VVG     |      2283 |
| making      | VVG     |      2125 |
| thinking    | VVG     |      1899 |
| using       | VVG     |      1607 |
| during      | PRP     |      1581 |
| putting     | VVG     |      1571 |
| bring       | VVI     |      1532 |
| interesting | AJ0     |      1476 |
| meeting     | NN1     |      1374 |
| giving      | VVG     |      1148 |
| sitting     | VVG     |      1127 |
| asking      | VVG     |      1090 |
| playing     | VVG     |      1084 |
| evening     | NN1     |      1080 |
| fucking     | AV0     |      1080 |
| telling     | VVG     |      1080 |
| building    | NN1     |      1034 |
| running     | VVG     |       924 |
| paying      | VVG     |       883 |
| training    | NN1     |       783 |
| training    | NN1_VVG |       777 |
| walking     | VVG     |       731 |
| waiting     | VVG     |       720 |
| fucking     | AJ0     |       717 |
| bring       | VVB     |       714 |
| beginning   | NN1     |       694 |
| moving      | VVG     |       686 |
| listening   | VVG     |       666 |
| living      | VVG     |       659 |
| ring        | VVI     |       655 |
| saying      | NN1_VVG |       651 |
| watching    | VVG     |       628 |
| speaking    | VVG     |       618 |
| meeting     | NN1_VVG |       617 |
| starting    | VVG     |       613 |
| happening   | VVG     |       608 |
| planning    | NN1     |       598 |
| darling     | NN1     |       585 |
| seeing      | VVG     |       564 |
| selling     | VVG     |       551 |
| ring        | NN1     |       543 |
| buying      | VVG     |       526 |
| writing     | VVG     |       503 |
| dealing     | VVG     |       501 |
| existing    | AJ0     |       499 |
| showing     | VVG     |       486 |
| keeping     | VVG     |       481 |
| planning    | NN1_VVG |       478 |
| housing     | NN1     |       473 |
| bringing    | VVG     |       469 |
| feeling     | NN1     |       442 |
| standing    | VVG     |       436 |
| including   | PRP     |       430 |
| leaving     | VVG     |       422 |
| housing     | NN1_VVG |       410 |
| hoping      | VVG     |       402 |
| wedding     | NN1     |       401 |
| working     | AJ0     |       400 |
| reading     | VVG     |       397 |
| eating      | VVG     |       388 |
| standing    | NN1_VVG |       376 |
| changing    | VVG     |       368 |
| writing     | NN1_VVG |       354 |
| finding     | VVG     |       351 |
| carrying    | VVG     |       346 |
| knowing     | VVG     |       338 |
| wearing     | VVG     |       338 |
| happening   | NN1_VVG |       336 |
| staying     | VVG     |       336 |
| driving     | VVG     |       335 |
| following   | VVG     |       335 |
| fucking     | AJ0_AV0 |       333 |
| feeling     | VVG     |       330 |
| turning     | VVG     |       329 |
| wanting     | VVG     |       318 |
| spending    | VVG     |       317 |
| picking     | VVG     |       310 |
| cutting     | VVG     |       307 |
| doing       | NN1     |       306 |
| boring      | AJ0     |       305 |
| sing        | VVI     |       305 |
| coming      | AJ0_VVG |       304 |
| recording   | VVG     |       301 |
+-------------+---------+-----------+

The 100 most frequent verbs ending in -ing tagged as gerund 
(including portmanteau tags):
+------------+---------+-----------+
| word       | tag     | frequency |
+------------+---------+-----------+
| going      | VVG     |     22319 |
| doing      | VDG     |      9579 |
| being      | VBG     |      6567 |
| getting    | VVG     |      5590 |
| saying     | VVG     |      5365 |
| coming     | VVG     |      5086 |
| having     | VHG     |      4687 |
| looking    | VVG     |      4301 |
| talking    | VVG     |      4135 |
| trying     | VVG     |      3143 |
| working    | VVG     |      2785 |
| taking     | VVG     |      2283 |
| making     | VVG     |      2125 |
| thinking   | VVG     |      1899 |
| using      | VVG     |      1607 |
| putting    | VVG     |      1571 |
| giving     | VVG     |      1148 |
| sitting    | VVG     |      1127 |
| asking     | VVG     |      1090 |
| playing    | VVG     |      1084 |
| telling    | VVG     |      1080 |
| running    | VVG     |       924 |
| paying     | VVG     |       883 |
| training   | NN1_VVG |       777 |
| walking    | VVG     |       731 |
| waiting    | VVG     |       720 |
| moving     | VVG     |       686 |
| listening  | VVG     |       666 |
| living     | VVG     |       659 |
| saying     | NN1_VVG |       651 |
| watching   | VVG     |       628 |
| speaking   | VVG     |       618 |
| meeting    | NN1_VVG |       617 |
| starting   | VVG     |       613 |
| happening  | VVG     |       608 |
| seeing     | VVG     |       564 |
| selling    | VVG     |       551 |
| buying     | VVG     |       526 |
| writing    | VVG     |       503 |
| dealing    | VVG     |       501 |
| showing    | VVG     |       486 |
| keeping    | VVG     |       481 |
| planning   | NN1_VVG |       478 |
| bringing   | VVG     |       469 |
| standing   | VVG     |       436 |
| leaving    | VVG     |       422 |
| housing    | NN1_VVG |       410 |
| hoping     | VVG     |       402 |
| reading    | VVG     |       397 |
| eating     | VVG     |       388 |
| standing   | NN1_VVG |       376 |
| changing   | VVG     |       368 |
| writing    | NN1_VVG |       354 |
| finding    | VVG     |       351 |
| carrying   | VVG     |       346 |
| knowing    | VVG     |       338 |
| wearing    | VVG     |       338 |
| happening  | NN1_VVG |       336 |
| staying    | VVG     |       336 |
| driving    | VVG     |       335 |
| following  | VVG     |       335 |
| feeling    | VVG     |       330 |
| turning    | VVG     |       329 |
| wanting    | VVG     |       318 |
| spending   | VVG     |       317 |
| picking    | VVG     |       310 |
| cutting    | VVG     |       307 |
| coming     | AJ0_VVG |       304 |
| recording  | VVG     |       301 |
| learning   | VVG     |       298 |
| wondering  | VVG     |       296 |
| running    | AJ0_VVG |       285 |
| building   | NN1_VVG |       277 |
| moving     | AJ0_VVG |       269 |
| providing  | VVG     |       268 |
| lying      | VVG     |       267 |
| increasing | VVG     |       264 |
| holding    | VVG     |       258 |
| suggesting | VVG     |       254 |
| laughing   | VVG     |       248 |
| sending    | VVG     |       244 |
| talking    | NN1_VVG |       242 |
| falling    | VVG     |       234 |
| singing    | VVG     |       234 |
| becoming   | VVG     |       233 |
| calling    | VVG     |       232 |
| making     | NN1_VVG |       232 |
| meeting    | VVG     |       232 |
| beginning  | VVG     |       228 |
| helping    | VVG     |       228 |
| pulling    | VVG     |       227 |
| working    | AJ0_VVG |       225 |
| teaching   | NN1_VVG |       224 |
| setting    | VVG     |       223 |
| losing     | VVG     |       219 |
| growing    | VVG     |       218 |
| expecting  | VVG     |       216 |
| pushing    | VVG     |       214 |
| reading    | NN1_VVG |       214 |
| hanging    | VVG     |       207 |
+------------+---------+-----------+

The 100 most frequent words ending in -ing which are *not* tagged as gerund.
+---------------+---------+-----------+
| word          | tag     | frequency |
+---------------+---------+-----------+
| something     | PNI     |     13380 |
| thing         | NN1     |     11789 |
| anything      | PNI     |      6522 |
| morning       | NN1     |      4760 |
| nothing       | PNI     |      4176 |
| everything    | PNI     |      3724 |
| during        | PRP     |      1581 |
| bring         | VVI     |      1532 |
| interesting   | AJ0     |      1476 |
| meeting       | NN1     |      1374 |
| evening       | NN1     |      1080 |
| fucking       | AV0     |      1080 |
| building      | NN1     |      1034 |
| training      | NN1     |       783 |
| fucking       | AJ0     |       717 |
| bring         | VVB     |       714 |
| beginning     | NN1     |       694 |
| ring          | VVI     |       655 |
| planning      | NN1     |       598 |
| darling       | NN1     |       585 |
| ring          | NN1     |       543 |
| existing      | AJ0     |       499 |
| housing       | NN1     |       473 |
| feeling       | NN1     |       442 |
| including     | PRP     |       430 |
| wedding       | NN1     |       401 |
| working       | AJ0     |       400 |
| fucking       | AJ0_AV0 |       333 |
| doing         | NN1     |       306 |
| boring        | AJ0     |       305 |
| sing          | VVI     |       305 |
| amazing       | AJ0     |       290 |
| following     | AJ0     |       287 |
| willing       | AJ0     |       281 |
| shopping      | NN1     |       277 |
| exciting      | AJ0     |       265 |
| advertising   | NN1     |       264 |
| king          | NN1     |       256 |
| washing       | NN1     |       241 |
| disgusting    | AJ0     |       240 |
| engineering   | NN1     |       226 |
| heating       | NN1     |       216 |
| spring        | NN1     |       213 |
| pudding       | NN1     |       212 |
| understanding | NN1     |       211 |
| being         | NN1     |       210 |
| sing          | VVB     |       203 |
| ring          | VVB     |       200 |
| swimming      | NN1     |       190 |
| reading       | NN1     |       178 |
| providing     | CJS     |       172 |
| marketing     | NN1     |       166 |
| ring          | NN1_VVB |       160 |
| hunting       | NN1     |       158 |
| living        | AJ0     |       158 |
| fishing       | NN1     |       157 |
| ceiling       | NN1     |       148 |
| surprising    | AJ0     |       148 |
| offspring     | NN0     |       146 |
| teaching      | NN1     |       145 |
| outstanding   | AJ0     |       137 |
| wing          | NN1     |       127 |
| writing       | NN1     |       127 |
| string        | NN1     |       125 |
| dining        | NN1     |       124 |
| driving       | AJ0     |       124 |
| leading       | AJ0     |       124 |
| regarding     | PRP     |       123 |
| wording       | NN1     |       122 |
| bearing       | NN1     |       119 |
| meaning       | NN1     |       119 |
| freezing      | AJ0     |       110 |
| opening       | NN1     |       110 |
| living        | NN1     |       107 |
| living        | AJ0_NN1 |       106 |
| blooming      | AJ0     |       105 |
| embarrassing  | AJ0     |       105 |
| growing       | AJ0     |       105 |
| king          | NP0     |       105 |
| painting      | NN1     |       104 |
| warning       | NN1     |       104 |
| boxing        | NN1     |       103 |
| encouraging   | AJ0     |       102 |
| flipping      | AJ0     |        98 |
| drawing       | NN1     |        96 |
| running       | AJ0     |        96 |
| parking       | NN1     |        95 |
| standing      | NN1     |        95 |
| lighting      | NN1     |        93 |
| moving        | AJ0     |        87 |
| annoying      | AJ0     |        86 |
| recording     | NN1     |        84 |
| appalling     | AJ0     |        83 |
| heading       | NN1     |        82 |
| fascinating   | AJ0     |        81 |
| shilling      | NN1     |        81 |
| spelling      | NN1     |        80 |
| bleeding      | AJ0     |        78 |
| cooking       | NN1     |        78 |
| clothing      | NN1     |        77 |
+---------------+---------+-----------+


-------------------------------------------------------------
/  Sebastian Hoffmann          |  University of Zurich       \
| e-Mail: sebhoff at es.unizh.ch  |  English Department          |
| Plattenstrasse 47            |  CH-8032 Zurich/Switzerland  |
\  Phone: (41 1) 634 35 51     |  Fax: (41 1) 634 49 08      /
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 18844 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20000222/9d669b4a/attachment-0001.bin>


More information about the Corpora mailing list