Corpora: POS tagging of spoken corpora (summary)
    Jean Veronis 
    Jean.Veronis at newsup.univ-mrs.fr
       
    Tue Sep 19 13:40:16 UTC 2000
    
    
  
A few people responded to my query (thanks to Lars Borin, Mats Eeg-Olofsson 
, Andrew Harley , Joakim Nivre, Paul Rayson, Geoffrey Sampson, Maria 
Wolters and Jakub Zavrel), but as I suspected, there seems to exist only a 
handful of publications on this topic:
Eeg-Olofsson, M. (1991). Word-Class Tagging: Some Computational Tools. 
Doctoral dissertation. University of Göteborg: Department of Computational 
Linguistics.
Garside, R. (1995) Grammatical tagging of the spoken part of the British 
National Corpus: a progress report. In Leech, G., Myers, G. and Thomas, J. 
(eds) (1995), Spoken English on Computer: Transcription, Mark-up and 
Application. pp.161-7
Garside, R. (1995) English for the Computer, Clarendon Press (ch. 6).
Leech, G. N., Myers, G., & Thomas, J. (1995). Spoken English on Computer: 
Transcription, mark-up and application. London: Longman.
Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T. & Sofkova, S. (1996) 
Tagging Spoken Language Using Written Language Statistics. In Proceedings 
of the 16th International Conference of Computational Linguistics 
(COLING-96). Copenhagen: Center for Language Technology. [Available at: 
<http://www.ling.gu.se/~nivre/papers/16coling.ps>]
Nivre, J. & Grönqvist, L. (in press) Tagging a Corpus of Spoken Swedish. To 
appear in International Journal of Corpus Linguistics. [Available at: 
<http://www.ling.gu.se/~nivre/papers/ijcl.ps>]
Rahman, A. & Sampson, G.R.  "Extending grammar annotation standards to 
spontaneous speech", in J.M. Kirk, ed., Corpora Galore: Analyses and 
Techniques in Describing English_, Rodopi (Amsterdam), 1999, pp. 295-311.
Sampson, G. R. (1995). English for the Computer: The SUSANNE Corpus and 
Analytic Scheme. Oxford: Clarendon Press.
Smith, N. (1997) Improving a tagger, in Garside, R., Leech, G., and 
McEnery, A. (eds.) Corpus Annotation: Linguistic Information from Computer 
Text Corpora. Longman, London, pp. 137-150.
Van Eynde, F., Zavrel J. & Daelemans, W. (2000). Part of Speech Tagging and 
Lemmatisation for the Spoken Dutch Corpus. In: M. Gavrilidou, G. 
Carayannis, S. Markantonatou, S. Piperidis & G. Stainhaouer (eds.), 
Proceedings of the Second International Conference on Language Resources 
and Evaluation. European Language Resources Association, Paris, 1427-1433.
Also
CANCODE project: <http://www.cambridge.org/elt/reference/cancode.htm
CHRISTINE Corpus: <http://www.cogs.susx.ac.uk/users/geoffs/ChrisDoc.html>
Stefan Rapp's thesis: <http://www.ims.uni-stuttgart.de/>
    
    
More information about the Corpora
mailing list