Corpora: POS tagging of spoken corpora (summary)
Jean Veronis
Jean.Veronis at newsup.univ-mrs.fr
Tue Sep 19 13:40:16 UTC 2000
A few people responded to my query (thanks to Lars Borin, Mats Eeg-Olofsson
, Andrew Harley , Joakim Nivre, Paul Rayson, Geoffrey Sampson, Maria
Wolters and Jakub Zavrel), but as I suspected, there seems to exist only a
handful of publications on this topic:
Eeg-Olofsson, M. (1991). Word-Class Tagging: Some Computational Tools.
Doctoral dissertation. University of Göteborg: Department of Computational
Linguistics.
Garside, R. (1995) Grammatical tagging of the spoken part of the British
National Corpus: a progress report. In Leech, G., Myers, G. and Thomas, J.
(eds) (1995), Spoken English on Computer: Transcription, Mark-up and
Application. pp.161-7
Garside, R. (1995) English for the Computer, Clarendon Press (ch. 6).
Leech, G. N., Myers, G., & Thomas, J. (1995). Spoken English on Computer:
Transcription, mark-up and application. London: Longman.
Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T. & Sofkova, S. (1996)
Tagging Spoken Language Using Written Language Statistics. In Proceedings
of the 16th International Conference of Computational Linguistics
(COLING-96). Copenhagen: Center for Language Technology. [Available at:
<http://www.ling.gu.se/~nivre/papers/16coling.ps>]
Nivre, J. & Grönqvist, L. (in press) Tagging a Corpus of Spoken Swedish. To
appear in International Journal of Corpus Linguistics. [Available at:
<http://www.ling.gu.se/~nivre/papers/ijcl.ps>]
Rahman, A. & Sampson, G.R. "Extending grammar annotation standards to
spontaneous speech", in J.M. Kirk, ed., Corpora Galore: Analyses and
Techniques in Describing English_, Rodopi (Amsterdam), 1999, pp. 295-311.
Sampson, G. R. (1995). English for the Computer: The SUSANNE Corpus and
Analytic Scheme. Oxford: Clarendon Press.
Smith, N. (1997) Improving a tagger, in Garside, R., Leech, G., and
McEnery, A. (eds.) Corpus Annotation: Linguistic Information from Computer
Text Corpora. Longman, London, pp. 137-150.
Van Eynde, F., Zavrel J. & Daelemans, W. (2000). Part of Speech Tagging and
Lemmatisation for the Spoken Dutch Corpus. In: M. Gavrilidou, G.
Carayannis, S. Markantonatou, S. Piperidis & G. Stainhaouer (eds.),
Proceedings of the Second International Conference on Language Resources
and Evaluation. European Language Resources Association, Paris, 1427-1433.
Also
CANCODE project: <http://www.cambridge.org/elt/reference/cancode.htm
CHRISTINE Corpus: <http://www.cogs.susx.ac.uk/users/geoffs/ChrisDoc.html>
Stefan Rapp's thesis: <http://www.ims.uni-stuttgart.de/>
More information about the Corpora
mailing list