6.519 Sum: English compound noun corpora

The Linguist List linguist at tam2000.tamu.edu
Thu Apr 6 23:31:25 UTC 1995


----------------------------------------------------------------------
LINGUIST List:  Vol-6-519. Thu 06 Apr 1995. ISSN: 1068-4875. Lines: 96
 
Subject: 6.519 Sum: English compound noun corpora
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
 
Asst. Editors: Ron Reck <rreck at emunix.emich.edu>
               Ann Dizdar <dizdar at tam2000.tamu.edu>
               Ljuba Veselinova <lveselin at emunix.emich.edu>
               Annemarie Valdez <avaldez at emunix.emich.edu>
 
-------------------------Directory-------------------------------------
 
1)
Date: Tue, 4 Apr 1995 16:21:15 +0200
From: Cecile Fabre (Cecile.Fabre at irisa.fr)
Subject: Sum : English Compound Noun Corpora
 
-------------------------Messages--------------------------------------
1)
Date: Tue, 4 Apr 1995 16:21:15 +0200
From: Cecile Fabre (Cecile.Fabre at irisa.fr)
Subject: Sum : English Compound Noun Corpora
 
Content-Length: 3382
 
One month ago I sent a query to obtain English noun compound corpora.
These are the two largest lists I received :
 
1. a 1-MB word list of compounds from a spellchecker for the NeXt
computer, sent by George Fowler.
 
2. a 9000 binary nominals sent by Richard Sproat with judgments on
accent placement. It is described in :
 
Richard Sproat, ``English Noun-Phrase Accent Prediction for
 Text-to-Speech.'' {\it Computer Speech and Language}, 1994, 8,
 79--94.
 
The 2 files are available by anonymous FTP from the following site :
 
ftp.irisa.fr under the directory /local/corpus
 
Other responses were mainly advice to build my own list
from tagged corpora (Brown Corpus, Penn Treebank, etc.) or by
statistical methods (see Johansson, C., 1994, Catching the Cheshire
Cat, proc.  COLING, Kyoto, /http://www.ling.lu.se).
 
I received also some biliographical references on the treatment of
complex nominal sequences, which I reproduce below.
 
Thanks to :  Eric Steven Atwell, Paul Bennett, Pier Marco Bertinetto,
Beatrice Daille, George Fowler, Christer Johansson, Bernie Jones, Mark
Lauer, Judith N.  Levi, Philip Resnik, Richard Sproat, Achim Stein,
Wilco Ter Stal, Evelyne Tzoukermann, Nick Youd.
 
Bibliographical references :
 
Paul Bennett, A Multilingual Translation-oriented Typology of Compound
Nouns, TAL (Traitement Automatique du Langage), 1993, vol.34.
 
Church and Hanks, article in Computational Linguistics 16
 
Bernie Jones "Predicting Nominal Compounds", MPhil Dissertation,
University of Cambridge Engineering Department
 
Lauer, Mark (1994) "Conceptual Association for Compound Noun Analysis"
Proceedings of the Student Session of the 32nd Annual Meeting of the
Association for Computational Linguistics, June, Las Cruces, New Mexico
 
Lauer, Mark and Dras, Mark (1994) "A Probabilistic Model of Compound
Nouns" Proceedings of the 7th Australian Joint Conference on Artificial
Intelligence, November, Armidale, Australia
 
Levi, Judith N.  1978.  THE SYNTAX AND SEMANTICS OF COMPLEX NOMINALS.
NY:  Academic Press.
    Includes an appendix of compound forms.
 
Leonard, Rosemary.  1984.  THE INTERPRETATION OF ENGLISH NOUN SEQUENCES
ON THE COMPUTER.  Amsterdam:  North-Holland
        This study used 2000 noun sequences taken from a corpus of
        300,000 words of English fiction from 1700 to now.
 
Ryder, Mary Ellen.  1994.  ORDERED CHAOS:  THE INTERPRETATION OF
ENGLISH NOUN-NOUN COMPOUNDS.  Berkeley/Los Angeles/ London:  University
of California Press.
     Focuses esp. on interpretation of novel pairings.
 
Rivista di Linguista, 4,1, 1992
 
Wilco G. ter Stal & Paul E. van der Vet, Two-level semantic analysis of
compounds
 
--------------------------------------------------------------------------
LINGUIST List: Vol-6-519.



More information about the LINGUIST mailing list