FW: 14.3465, FYI: New Website: Phrases in English

Benjamin Barrett bjb5 at U.WASHINGTON.EDU
Sat Dec 13 04:59:26 UTC 2003


I thought this might be of interest to some...

Benjamin Barrett

-----Original Message-----
From: The LINGUIST Discussion List
[mailto:LINGUIST at LISTSERV.LINGUISTLIST.ORG] On Behalf Of LINGUIST List
Sent: Friday, 12 December 2003 4:58 PM

LINGUIST List:  Vol-14-3465. Fri Dec 12 2003. ISSN: 1068-4875.

Subject: 14.3465, FYI: New Website: Phrases in English

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
        Sheila Collberg, U. of Arizona
        Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne State
University, and donations from subscribers and publishers.

Editor for this issue: Anne Clarke <anne at linguistlist.org>
 ==========================================================================

Date:  Thu, 11 Dec 2003 11:45:10 -0500 (EST)
From:  William Fletcher <fletcher at usna.edu>
Subject:  New Website: Phrases in English


A new website, ''Phrases in English'' (PIE), has been launched:
http://pie.usna.edu While still under development, PIE already offers much
to both linguists and students, and additional features will increase its
scope in the future.

PIE incorporates a database of all 1-6-grams (phrases 1-6 ''words''
long) with part-of-speech (POS) codes occurring three or more times in the
100-million-word British National Corpus (BNC).  One can explore English
phraseology either through lists of forms and their frequencies or by
searching for specific forms or collocations, e.g. 2-grams of the pattern
''ADJ work'', to find the most frequent adjectives describing work.

PIE also offers a phrase pattern discovery tool, ''phrase-frames'': sets of
variants of an n-gram identical except for one word (wildcard symbol *). The
most frequent and productive 4-frame is ''the * of the'', with variants such
''as the end of the'', ''the rest of the'', ''the top of the'', ''the nature
of the''

Over the next year PIE will add:

-- Click on an n-gram in the query results to see concordances from the BNC

-- POS-grams and POS-frames for studying the relative productivity of phrase
structures

-- Filtering by text type (domain, genre, target audience) for contrastive
studies

-- Query by regular expression (currently only wildcards are
supported)

In addition, when POS-tagging of the Michigan Corpus of Academic Spoken
English (MICASE) http://www.hti.umich.edu/micase/ is complete, a similar
database will be created with those data.  Finally, when a substantial
portion of the American National Corpus (ANC)
http://americannationalcorpus.org has been released, a third parallel
database will be built.  Together these databases will permit comparative
studies of phraseology in the principal variants of English.

Please note:

- ''Unfiltered'' queries which match very large datasets can take several
minutes to complete.  Please be patient; read the tutorials and FAQ to focus
your queries.

- Users who cannot access the above site may use http://kwicfinder.com/BNC/
(please let me know so we can investigate)


Acknowledgements

Above all I am grateful to Michael Stubbs of the University of Trier for
detailed suggestions and ongoing discussions that led to the creation and
refinement of this site; even the ''easy as pie'' to remember acronym goes
back to him. His research assistants contributed as well: Isabel Barth
implemented the original phrase-frame generator and Katrin Ungeheuer offered
valuable comments on organization and user-interface for query by text-type.
Finally Lou Burnard of the BNC Consortium and David Lee of MICASE granted
essential permissions and provided useful feedback on the site.

All user feedback will be received enthusiastically!

Bill Fletcher

fletcher AT usna.edu
fletcher AT kwicfinder.com

http://pie.usna.edu
http://kwicfinder.com
---------------------------------------------------------------------------
LINGUIST List: Vol-14-3465



More information about the Ads-l mailing list