Corpora: Book: Word frequency dstributions

Jean Veronis Jean.Veronis at newsup.univ-mrs.fr
Tue Sep 11 12:15:26 UTC 2001


**** NEW BOOK *** NEW BOOK *** NEW BOOK *** NEW BOOK *** NEW BOOK ****



                        KLUWER ACADEMIC PUBLISHERS
                   TEXT, SPEECH AND LANGUAGE TECHNOLOGY
                               Volume 18
              Series editors: Nancy Ide and Jean Véronis




                      WORD FREQUENCY DISTRIBUTIONS
                                  by
                         R. Harald Baayen
                 University of Nijmegen, The Netherlands



This book is a comprehensive introduction to the statistical analysis of 
word frequency distributions, intended for computational linguists, corpus 
linguists, psycholinguists, and researchers in the field of quantitative 
stylistics. Word frequency distributions are characterized by very large 
numbers of rare words. This property leads to strange phenomena such as 
mean frequencies that systematically change as the number of observations 
is increased, relative frequencies that even in large samples are not fully 
reliable estimators of population probabilities, and model parameters that 
vary with text or corpus size. Special statistical techniques for the 
analysis of distributions with large numbers of rare events can be found in 
various technical journals. The aim of this book is to make these 
techniques more accessible for non-specialists, both theoretically, by 
means of a careful introduction to the underlying probabilistic and 
statistical concepts, and practically, by providing a program library 
implementing the main models for word frequency distributions (CD-ROM 
included).




Kluwer Academic Publishers, Dordrecht
Hardbound, ISBN 0-7923-7017-1
June 2001, 356 pp.
EUR 117.00 / USD 108.00 / GBP 74.00



---------------------------------------------------------------------




CONTENTS

1. Word Frequencies.

2. Non-parametric models.

3. Parametric models.

4. Mixture distributions.

5. The Randomness Assumption.

6. Examples of Applications.

A. List of Symbols.

B. Solutions of the exercises.

C. Software.

D. Data sets.

Bibliography.

Index.


CD-ROM Included

---------------------------------------------------------------------

                            PREVIOUS VOLUMES


    Volume 1:  Recent Advances in Parsing Technology
               Harry Bunt, Masaru Tomita (Eds.)
               Hardbound, ISBN 0-7923-4152-X, 1996

    Volume 2:  Corpus-Based Methods in Language and Speech Processing
               Steve Young, Gerrit Bloothooft (Eds.)
               Hardbound, ISBN 0-7923-4463-4, 1997

    Volume 3:  An introduction to text-to-speech synthesis
               Thierry Dutoit
               Hardbound, ISBN 0-7923-4498-7, 1997

    Volume 4:  Exploring textual data
               Ludovic Lebart, André Salem and Lisette Berry
               Hardbound, ISBN 0-7923-4840-0, December 1997

    Volume 5:  Time Map Phonology:
               Finite State Models and Event Logics in Speech
               Recognition
               Julie Carson-Berndsen
               Hardbound, ISBN 0-7923-4883-4, 1997

    Volume 6:  Predicative Forms in Natural Language and in
               Lexical Knowledge Bases
               Patrick Saint-Dizier (Ed.)
               Hardbound, ISBN 0-7923-5499-0, December 1998

    Volume 7:  Natural Language Information Retrieval
               Tomek Strzalkowski (Ed.)
               Hardbound, ISBN 0-7923-5685-3, April 1999

    Volume 8:  Techniques in Speech Acoustics
               Jonathan Harrington, Steve Cassidy
               Hardbound, ISBN 0-7923-5731-0, July 1999

    Volume 9:  Syntactic Wordclass Tagging
               Hans van Halteren (Ed.)
               Hardbound, ISBN 0-7923-5896-1, August 1999

    Volume 10: Breadth and Depth of Semantic Lexicons
               Viegas, E. (Ed.)
               Hardbound, ISBN 0-7923-6039-7, November 1999

    Volume 11: Natural Language Processing Using Very Large Corpora
               Armstrong, S., Church, K.W., Isabelle, P.,
               Manzi, S., Tzoukermann, E., Yarowsky, D. (Eds.)
               Hardbound, ISBN 0-7923-6055-9, November 1999

    Volume 12: Lexicon Development for Speech and Language Processing
               Frank van Eynde & Dafydd Gibbon (Eds.)
               Hardbound, ISBN 0-7923-6368-X, April 2000.

    Volume 13: Parallel text processing:
               Alignment and use of translation corpora
               Jean Véronis (Ed.)
               Hardbound, ISBN 0-7923-6546-1, August 2000.

    Volume 14: Prosody: theory and experiment
               Studies Presented to Gösta Bruce
               Merle Horne (Ed.)
               Hardbound, ISBN 0-7923-6579-8, August 2000.

    Volume 15: Intonation : Analysis, Modelling and Technology
               Antonis Botinis (Ed.)
               Hardbound, ISBN 0-7923-6605-0, October 2000.
               Paperback, ISBN 0-7923-6723-5, October 2000.

    Volume 16: Advances in probabilistic and other parsing technologies
               Harry Bunt, Anton Nijholt (Eds.)
               Hardbound, ISBN 0-7923-6616-6, October 2000.

    Volume 17: Robustness in language and speech technology
               Jean-Claude Junqua, Gertjan van Noord (Eds.)
               Hardbound, ISBN 0-7923-6790-1, February 2001



Check the series Web page for order information:

    http://www.wkap.nl/series.htm/TLTB



More information about the Corpora mailing list