Corpora: Book: Word frequency dstributions
Jean Veronis
Jean.Veronis at newsup.univ-mrs.fr
Tue Sep 11 12:15:26 UTC 2001
**** NEW BOOK *** NEW BOOK *** NEW BOOK *** NEW BOOK *** NEW BOOK ****
KLUWER ACADEMIC PUBLISHERS
TEXT, SPEECH AND LANGUAGE TECHNOLOGY
Volume 18
Series editors: Nancy Ide and Jean Véronis
WORD FREQUENCY DISTRIBUTIONS
by
R. Harald Baayen
University of Nijmegen, The Netherlands
This book is a comprehensive introduction to the statistical analysis of
word frequency distributions, intended for computational linguists, corpus
linguists, psycholinguists, and researchers in the field of quantitative
stylistics. Word frequency distributions are characterized by very large
numbers of rare words. This property leads to strange phenomena such as
mean frequencies that systematically change as the number of observations
is increased, relative frequencies that even in large samples are not fully
reliable estimators of population probabilities, and model parameters that
vary with text or corpus size. Special statistical techniques for the
analysis of distributions with large numbers of rare events can be found in
various technical journals. The aim of this book is to make these
techniques more accessible for non-specialists, both theoretically, by
means of a careful introduction to the underlying probabilistic and
statistical concepts, and practically, by providing a program library
implementing the main models for word frequency distributions (CD-ROM
included).
Kluwer Academic Publishers, Dordrecht
Hardbound, ISBN 0-7923-7017-1
June 2001, 356 pp.
EUR 117.00 / USD 108.00 / GBP 74.00
---------------------------------------------------------------------
CONTENTS
1. Word Frequencies.
2. Non-parametric models.
3. Parametric models.
4. Mixture distributions.
5. The Randomness Assumption.
6. Examples of Applications.
A. List of Symbols.
B. Solutions of the exercises.
C. Software.
D. Data sets.
Bibliography.
Index.
CD-ROM Included
---------------------------------------------------------------------
PREVIOUS VOLUMES
Volume 1: Recent Advances in Parsing Technology
Harry Bunt, Masaru Tomita (Eds.)
Hardbound, ISBN 0-7923-4152-X, 1996
Volume 2: Corpus-Based Methods in Language and Speech Processing
Steve Young, Gerrit Bloothooft (Eds.)
Hardbound, ISBN 0-7923-4463-4, 1997
Volume 3: An introduction to text-to-speech synthesis
Thierry Dutoit
Hardbound, ISBN 0-7923-4498-7, 1997
Volume 4: Exploring textual data
Ludovic Lebart, André Salem and Lisette Berry
Hardbound, ISBN 0-7923-4840-0, December 1997
Volume 5: Time Map Phonology:
Finite State Models and Event Logics in Speech
Recognition
Julie Carson-Berndsen
Hardbound, ISBN 0-7923-4883-4, 1997
Volume 6: Predicative Forms in Natural Language and in
Lexical Knowledge Bases
Patrick Saint-Dizier (Ed.)
Hardbound, ISBN 0-7923-5499-0, December 1998
Volume 7: Natural Language Information Retrieval
Tomek Strzalkowski (Ed.)
Hardbound, ISBN 0-7923-5685-3, April 1999
Volume 8: Techniques in Speech Acoustics
Jonathan Harrington, Steve Cassidy
Hardbound, ISBN 0-7923-5731-0, July 1999
Volume 9: Syntactic Wordclass Tagging
Hans van Halteren (Ed.)
Hardbound, ISBN 0-7923-5896-1, August 1999
Volume 10: Breadth and Depth of Semantic Lexicons
Viegas, E. (Ed.)
Hardbound, ISBN 0-7923-6039-7, November 1999
Volume 11: Natural Language Processing Using Very Large Corpora
Armstrong, S., Church, K.W., Isabelle, P.,
Manzi, S., Tzoukermann, E., Yarowsky, D. (Eds.)
Hardbound, ISBN 0-7923-6055-9, November 1999
Volume 12: Lexicon Development for Speech and Language Processing
Frank van Eynde & Dafydd Gibbon (Eds.)
Hardbound, ISBN 0-7923-6368-X, April 2000.
Volume 13: Parallel text processing:
Alignment and use of translation corpora
Jean Véronis (Ed.)
Hardbound, ISBN 0-7923-6546-1, August 2000.
Volume 14: Prosody: theory and experiment
Studies Presented to Gösta Bruce
Merle Horne (Ed.)
Hardbound, ISBN 0-7923-6579-8, August 2000.
Volume 15: Intonation : Analysis, Modelling and Technology
Antonis Botinis (Ed.)
Hardbound, ISBN 0-7923-6605-0, October 2000.
Paperback, ISBN 0-7923-6723-5, October 2000.
Volume 16: Advances in probabilistic and other parsing technologies
Harry Bunt, Anton Nijholt (Eds.)
Hardbound, ISBN 0-7923-6616-6, October 2000.
Volume 17: Robustness in language and speech technology
Jean-Claude Junqua, Gertjan van Noord (Eds.)
Hardbound, ISBN 0-7923-6790-1, February 2001
Check the series Web page for order information:
http://www.wkap.nl/series.htm/TLTB
More information about the Corpora
mailing list