16.2960, FYI: New features at BNC/VIEW interface

LINGUIST List linguist at LINGUISTLIST.ORG
Thu Oct 13 14:17:52 UTC 2005


LINGUIST List: Vol-16-2960. Thu Oct 13 2005. ISSN: 1068 - 4875.

Subject: 16.2960, FYI: New features at BNC/VIEW interface

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Svetlana Aksenova <svetlana at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================  

1)
Date: 12-Oct-2005
From: Mark Davies < mark_davies at byu.edu >
Subject: New features at BNC/VIEW interface 

	
-------------------------Message 1 ---------------------------------- 
Date: Thu, 13 Oct 2005 10:08:07
From: Mark Davies < mark_davies at byu.edu >
Subject: New features at BNC/VIEW interface 
 

BNC/VIEW is a new architecture and interface for the 100 million word
British National Corpus.  It is freely availably on the web at
http://view.byu.edu

A number of new features have recently been added to the corpus. They
include the following:

** 1) CHARTS
You can now see (in graphical form) the frequency of a word, phrase, or
grammatical construction in the six major registers (e.g. spoken, academic)
and then each of the sub-registers (sermons, poetry, medical, etc).

** 2) IMPROVED SEARCHES FOR COLLOCATES
The search frame is now much wider when you search for collocates /
surrounding words -- up to ten words on the left and on the right.
Examples: the most common nouns near [kitchen], comparing the nouns near
[uncover] with those near [reveal], and comparing nouns near [chair] in
fiction and academic registers.

** 3) CUSTOMIZED, USER-DEFINED LISTS
You can create an unlimited number of customized lists, containing words
that are related together in any way that you might imagine. Via the web
interface you store these word lists, and you can then re-use them at any
point in the future.

** 4) SORTING BY RELEVANCE (MODIFIED Z-SCORE)
You can now sort by relevance, which provides a much better understanding
of which words are most tightly related together. This type of query, which
is similar to a z-score calculation, takes into account the overall
frequency of collocates and sorts out high-frequency ''noise'' words.


Features that were already available include the following:


** 5) BASIC QUERIES BY SUBSTRING, WORD, PHRASE, AND PART OF SPEECH
For example, the frequency of a given word, set of words, phrase, substring
(e.g. *heart*), part of speech (e.g. [av*] [aj*]: very clear), or
combinations of these (e.g. [vv*] it/them [avp]: took them away, give it up)

** 6) REGISTER-BASED QUERIES
You can find the frequency of words and phrases in any combination of
registers that you define -- on the fly -- e.g. spoken, academic, poetry,
or medical.  In addition, you can compare between registers -- for example,
verbs that are more common in legal or medical texts, phrases like [I *
that] that are more common in conversation than in non-fiction texts, nouns
near ''break'' (v) that are found primarily in academic writings, etc.

** 7) FREQUENCY IN ALL 70 REGISTERS
You can click on a word or phrase in any of the results sets to see the
frequency in all 70 registers. Sorted initially by normalized frequency,
you can re-sort by register name, number of tokens, etc.
 
** 8) COMPARING COLLOCATES WITH RELATED WORDS
For example, nouns that occurs after [utter] but not with [sheer] or
[total], adjectives within ten words of [man] that do not occur near
[woman], etc.  All of this is done via one simple query from the web
interface.  This may be quite useful for language learners, to allow them
to compare the uses of competing synonyms.

** 9) INTEGRATION WITH WORDNET (SEMANTICALLY-BASED QUERIES)
For example:
  [=bad] [nn*]: any synonym of [bad] followed by any noun, e.g. wicked
witch, foul play, terrible storm, etc
  my/your [@body]: [my] or [your] followed by a part of the body: my leg,
your shoulder, etc
  [<eat] the [<food]: a more specific word for [eat] followed by a more
specific word for [food], e.g. devour the hamburger, munch the cookies
Again, all of this is done via one simple query from the search form

** 10) COMPLETE RESULTS FAST AND FAST QUERIES
Unlike some other interfaces for the BNC, this one allows you to find all
of the matching strings -- not just those that occur three times or more. 
In addition, queries of the 100 million word corpus are quite fast -- less
than one or two seconds for most searches.

Again, the corpus is freely available at http://view.byu.edu. Please feel
free to email me with any questions or comments.

Mark Davies
Brigham Young University 



Linguistic Field(s): Computational Linguistics
                     General Linguistics
                     Lexicography
                     Semantics
                     Text/Corpus Linguistics





-----------------------------------------------------------
LINGUIST List: Vol-16-2960	

	



More information about the LINGUIST mailing list