Corpora: POS tagging of spoken corpora (summary)

Jean Veronis Jean.Veronis at newsup.univ-mrs.fr
Tue Sep 19 13:40:16 UTC 2000


A few people responded to my query (thanks to Lars Borin, Mats Eeg-Olofsson 
, Andrew Harley , Joakim Nivre, Paul Rayson, Geoffrey Sampson, Maria 
Wolters and Jakub Zavrel), but as I suspected, there seems to exist only a 
handful of publications on this topic:

Eeg-Olofsson, M. (1991). Word-Class Tagging: Some Computational Tools. 
Doctoral dissertation. University of Göteborg: Department of Computational 
Linguistics.

Garside, R. (1995) Grammatical tagging of the spoken part of the British 
National Corpus: a progress report. In Leech, G., Myers, G. and Thomas, J. 
(eds) (1995), Spoken English on Computer: Transcription, Mark-up and 
Application. pp.161-7

Garside, R. (1995) English for the Computer, Clarendon Press (ch. 6).

Leech, G. N., Myers, G., & Thomas, J. (1995). Spoken English on Computer: 
Transcription, mark-up and application. London: Longman.

Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T. & Sofkova, S. (1996) 
Tagging Spoken Language Using Written Language Statistics. In Proceedings 
of the 16th International Conference of Computational Linguistics 
(COLING-96). Copenhagen: Center for Language Technology. [Available at: 
<http://www.ling.gu.se/~nivre/papers/16coling.ps>]

Nivre, J. & Grönqvist, L. (in press) Tagging a Corpus of Spoken Swedish. To 
appear in International Journal of Corpus Linguistics. [Available at: 
<http://www.ling.gu.se/~nivre/papers/ijcl.ps>]

Rahman, A. & Sampson, G.R.  "Extending grammar annotation standards to 
spontaneous speech", in J.M. Kirk, ed., Corpora Galore: Analyses and 
Techniques in Describing English_, Rodopi (Amsterdam), 1999, pp. 295-311.

Sampson, G. R. (1995). English for the Computer: The SUSANNE Corpus and 
Analytic Scheme. Oxford: Clarendon Press.

Smith, N. (1997) Improving a tagger, in Garside, R., Leech, G., and 
McEnery, A. (eds.) Corpus Annotation: Linguistic Information from Computer 
Text Corpora. Longman, London, pp. 137-150.

Van Eynde, F., Zavrel J. & Daelemans, W. (2000). Part of Speech Tagging and 
Lemmatisation for the Spoken Dutch Corpus. In: M. Gavrilidou, G. 
Carayannis, S. Markantonatou, S. Piperidis & G. Stainhaouer (eds.), 
Proceedings of the Second International Conference on Language Resources 
and Evaluation. European Language Resources Association, Paris, 1427-1433.

Also
CANCODE project: <http://www.cambridge.org/elt/reference/cancode.htm
CHRISTINE Corpus: <http://www.cogs.susx.ac.uk/users/geoffs/ChrisDoc.html>
Stefan Rapp's thesis: <http://www.ims.uni-stuttgart.de/>



More information about the Corpora mailing list