16.1436, FYI: 100 Million Corpus: registers, WordNet, synonyms

Thu May 5 15:32:13 UTC 2005

LINGUIST List: Vol-16-1436. Thu May 05 2005. ISSN: 1068 - 4875.

Subject: 16.1436, FYI: 100 Million Corpus: registers, WordNet, synonyms

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Dooley, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Ann Sawyer <sawyer at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================

1)
Date: 04-May-2005
From: Mark Davies < mark_davies at byu.edu >
Subject: 100 Million Corpus: registers, WordNet, synonyms

-------------------------Message 1 ----------------------------------
Date: Thu, 05 May 2005 11:26:00
From: Mark Davies < mark_davies at byu.edu >
Subject: 100 Million Corpus: registers, WordNet, synonyms

There is a free resource that may be of interest -
"Variation in English Words and Phrases" found at:

http://view.byu.edu

This is a new interface to the 100 million word British National Corpus,
probably the most well-known corpus of English.  One can carry out the
following types of searches -- most of which are not possible with any
other interface:

1. Quickly find the frequency of words and phrases in any combination of
more than 70 registers that you define (spoken, academic, poetry, medical,
tabloids, email, etc); e.g.:
-- the most common nouns in natural sciences texts, adjectives in
engineering texts, or verbs in medical texts
-- which collocates (co-occurring words) occur more in one register than
another; e.g. the collocates of [chair] in fiction vs. academic texts
-- variation in grammatical constructions across registers; e.g. the
relative frequency of the passive in academic vs spoken, the relative
frequency of [whom] in all 70 registers, etc.

2. Compare between synonyms and other semantically-related words.  One
simple search, for example, shows the most frequent nouns that appear with
[sheer], [complete], or [utter] (sheer nonsense, complete account, utter
dismay), but not with the others. Another simple search, for example, would
look for adjectives that occur with [woman] but not [man] or [child].

3. You can also input information from WordNet (a semantically-organized
lexicon of English) directly into the search form.  This allows you to find
the frequency and distribution of words with similar, more general, or more
specific meanings (e.g. the frequency of synonyms of [world], or the
frequency of more specific words for [jump]).

4. Search for words and phrases by exact word or phrase, wildcard or part
of speech, or combinations of these (e.g. *ly good/bad [n*]: really good
time, extremely bad idea).

5. Use anchors and targets for fuzzy matches (e.g. all nouns somewhere near
[paper], all adjectives near [woman], or all nouns near [spin]).

Please feel free to email me with any questions that you might have.

Mark Davies
Dept. Linguistics, Brigham Young University
http://davies-linguistics.byu.edu

Linguistic Field(s): Computational Linguistics
                     Discourse Analysis
                     Lexicography
                     Text/Corpus Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-16-1436