Corpora: Automatic Word Categorisation - again

Klas klas.prytz at ling.uu.se
Mon Nov 27 11:09:09 UTC 2000


Dear list members

Some time ago I posted a request for references to work in the field of
automatic word categorisation. I want to thank all who answered me. I have
included the references in this mail.

Yours sincerely

/Klas Prytz




>From Jose Maria Gomez Hidalgo

* David Lewis (http://www.research.att.com/~lewis/) presented in his
dissertation (1992) an organization of text classification tasks in two
tracks: document classification and term (word) classification. In the
first chapter, he describes some term classification tasks oriented to
Information Retrieval, like term clustering (thesaurus construction) and
others. This organization and the referneces could be a good start.


* Manning and Schuetze wrote a book, Foundations of Statistical Natural
Language Processing, that include the description of some word
classification tasks: word sense disambiguation, part of speech tagging.
The main interest of this book is the good introduction to the technics in
the field.
Companion website: http://www-nlp.Stanford.EDU/fsnlp/



>From E Tjong Kim Sang

Jakub Zavrel and Jorn Veenstra, "Continuous Task-Specific Categories
   for Disambiguation: Putting Lexical Constraints (back) in the Lexicon",
   Conference on Architectures and Mechanisms for Language Processing
   (AMLaP-96). Torino, Italy, 1996.

   Jakub Zavrel, "Lexical Space: Learning and Using Continuous Linguistic
   Representations", Masters Thesis, Cognitive Artificial Intelligence,
   Department of Philosophy, Utrecht University, 1996.

   url: http://pcger39.uia.ac.be/~zavrel/

>From Eric Atwell

Elliott J and Atwell E. 2000. Is anybody out there?: the detection of
intelligent and generic language-like features. In Journal of the British
Interplanetary Society, volume 53 no.1/2 pages 13-22, British
Interplanetary Society, London. ISSN: 0007-084X.

Elliott J, Atwell, E and Whyte B. 2000. Language identification in unknown
signals. in Proceeding of COLING'2000, 18th International Conference on
Computational Linguistics, pages 1021-1026, Association for Computational
Linguistics (ACL) and Morgan Kaufmann Publishers, San Francisco. ISBN:
1-55860-717-X (2 volumes).

Elliott J, Atwell, E and Whyte B. 2000. Increasing our ignorance of
language: identifying language structure in an unknown signal. In
Daelemans W (ed) Proceedings of CoNLL-2000: International Conference on
Computational Natural Language Learning, Lisbon, Portugal.

Elliott J and Atwell E. 1999. Language in signals: the detection of
generic species-independent intelligent language features in symbolic and
oral communications. In Proceedings of the 50th International
Astronautical Congress, paper IAA-99-IAA.9.1.08, Amsterdam. International
Astronautical Federation, Paris.


>From Markus Schulze

at the following URL you will find the key data of DMM - a system for
morphological analysis (lemmatisation, categorisation, segmentation)
of the German language:

http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.en.html

This page also contains a link to an interactive demo of the system
and an link to the full documentation (german only


>From Alexander Clark

@INPROCEEDINGS{chater-finch1,
  AUTHOR =	 {Finch, S. and Chater, N.},
  TITLE =	 {Bootstrapping syntactic categories},
  YEAR =	 {1992},
  BOOKTITLE =	 {Proceedings of the 14th Annual Meeting of the
                  Cognitive Science Society},
  PAGES =	 {820-825},
}

@INPROCEEDINGS{chater-finch2,
  AUTHOR =	 {Finch, S. and Chater, N.},
  TITLE =	 {Bootstrapping syntactic categories using statistical
                  methods},
  YEAR =	 {1992},
  BOOKTITLE =	 {Background and Experiments in Machine Learning of
                  Natural Language},
  PAGES =	 {229-235},
  EDITOR =	 {Daelemans, W. and Powers, D.},
  PUBLISHER =	 {Tilburg University: Institute for Language
                  Technology and AI}
}

@INPROCEEDINGS{chater-finch3,
  AUTHOR =	 {Finch, S. and Chater, N. and Redington, M.},
  TITLE =	 {Acquiring syntactic information from distributional
                  statistics},
  YEAR =	 {1995},
  EDITOR =	 {Levy, Joseph P. and Bairaktaris, Dimitrios and
                  Bullinaria, John A. and Cairns, Paul},
  BOOKTITLE =	 {Connectionist Models of Memory and Language},
  PUBLISHER =	 {UCL Press}
}



@ARTICLE{brown-92,
  AUTHOR =	 {Brown, Peter F. and Della Pietra, Vincent J. and de
                  Souza, Peter V. and Lai, Jenifer C. and Mercer,
                  Robert},
  TITLE =	 {Class-based n-gram models of natural language},
  YEAR =	 {1992},
  VOLUME =	 {18},
  PAGES =	 {467-479},
  JOURNAL =	 {Computational Linguistics}
}

@Article{ney-essen-kneser,
  author =	 {Ney, Hermann and Essen, Ute and Kneser, Reinhard},
  title =	 {On Structuring Probabilistic dependencies in
                  stochastic language modelling},
  journal =	 {Computer Speech and Language},
  year =	 {1994},
  volume =	 {8},
  pages =	 {1-28}
}

@INPROCEEDINGS{pereira-cluster,
  AUTHOR =	 {Pereira, Fernando and Tishby, Natali and Lee,
                  Lillian},
  TITLE =	 "Distributional Clustering of {English} words",
  YEAR =	 {1993},
  BOOKTITLE =	 "Proceedings of the 31st annual meeting of the
                  {Association for Computational Linguistics}"
}

@InProceedings{clark-00,
  author =	 {Clark, Alexander},
  title =	 {Inducing Syntactic Categories by Context
                  Distribution Clustering},
  pages =	 {91-94},
  year =	 {2000},
  booktitle =	 {Proceedings of CoNLL-2000 and LLL-2000},
  address =	 {Lisbon, Portugal}
}



Klas Prytz
Institutionen för lingvistik
Uppsala universitet
018-471 1174
Hemadress:
Nygården
747 94, Alunda
0174/133 01



More information about the Corpora mailing list