[Corpora-List] Addendum to audiovisual corpora: Summary of responses

Paul Thompson p.a.thompson at reading.ac.uk
Mon Dec 15 16:39:02 UTC 2008


I missed out one response to my request for information on audiovisual 
corpora in the summary that I sent out a couple of hours ago, and I am 
pastingt the e-mail message in below. My apologies to Janne Bondi 
Johannessen - this was an oversight on my part,

Paul


From: *Janne Bondi Johannessen* <jannebj at gmail.com 
<mailto:jannebj at gmail.com>>
Date: 2008/12/6
Subject: Re: [Corpora-List] Wanted: info on audio-visual corpora

At the University of Oslo, we created a general, audio-visual corpus for 
the Oslo dialect of Norwegian in 2005, 900 000 words. It is transcribed, 
POS tagged and has links to audio and video. Now, 2007-2009, we are 
developing a Nordic (Scandinavian) dialect corpus of five Nordic 
languages (Norwegian, Swedish, Danish, Icelandic and Faroese), with two 
transcription sets: orthographic and phonetic. Here, too, the 
transciptions are linked to audio and video.  We also have other, 
smaller corpora.

Both corpora are searchable via a user-friendly web-interface (called 
Glossa) into CQP, and all kinds of strings and substrings (words, 
affixes, combinations)  can be used with all kinds of grammatical 
categories. In addition, importantly, the searches can also use 
variables such as gender, age, education, and place of dwelling. The 
Nordic dialect corpus also allows searches where the user chooses 
ortograhic or phonetic search strings, and likewise, the search results 
can be shown in either transcription style. Finally, the user can search 
through the whole corpus or choose either place or country for delmiting 
the results.

We have developed a number of tools:
Glossa search interface
Semi-automatic dialect transliterator (to be used from phonetic to 
standard orthography) for the various dialects
Taggers for the spoken Nordic languages using written language taggers 
as starting point.

You are welcome to contact me for more information, or to try the Oslo 
Corpus, or to read about it:

- Mini-demo of spoken Oslo Norwegian Corpus: 
http://omilia.uio.no/glossa/html/index_dev.php?corpus=demo
(username and password: demo) (Interface only in Norwegian)
- Demo of Nordic Dialect Corpus: 
http://omilia.uio.no/glossa/html/index_dev.php?corpus=scandiasyn
(Please contact me in a separate e-mail for username and password) 
(Interface in English)

The Nordic Dialect Corpus: 
http://www.tekstlab.uio.no/nota/scandiasyn/english.html
Norwegian Speech Corpora: 
http://www.tekstlab.uio.no/nota/english/index.html#nota
Home page of the Text Laboratory: 
http://www.hf.uio.no/tekstlab/English/index.html

Papers:
Johannessen, Janne Bondi; Nygaard, Lars; Priestley, Joel; Nøklestad, 
Anders. 2008 .
Glossa: a Multilingual, Multimodal, Configurable User Interface. I: 
Proceedings of the Sixth International Language Resources and Evaluation 
(LREC'08). Paris: European Language Resources Association (ELRA) ISBN 
2-9517408-4-0. s. -
http://www.lrec-conf.org/proceedings/lrec2008/

Johannessen, Janne Bondi; Hagen, Kristin; Priestley, Joel; Nygaard, 
Lars. 2007.
An Advanced Speech Corpus for Norwegian. I: NODALIDA 2007 PROCEEDINGS. 
Tartu: University of Tartu  ISBN 978-9985-4-0513-0. s. 29-3
http://dspace.utlib.ee/dspace/handle/10062/2559

Johannessen, Janne Bondi; Hagen, Kristin; Priestley, Joel; Nygaard, 
Lars. 2006
A Speech Corpus with Emotions. I: Workshop Proceedings: W09 Corpora for 
Research on Emotions and Affect. LREC-2006. Pisa and Genova: Istituto di 
Linguistica Computazionale del Consiglio Nazionale delle Ricerche 
(ILC-CNR)  s. 80-84
http://www.sdjt.si/bib/lrec06/ (Click on the workshop Corpora for 
Research on Emotions and Affect)

Book:
Research from the Norwegian Corpus - in Norwegian, Swedish and Danish:
Johannessen, Janne Bondi; Hagen, Kristin. 2008
Om NoTa-korpuset og artiklene i denne boka. I: Språk i Oslo. Ny 
forskning omkring talespråk.. Novus Forlag  ISBN 978-82-7099-4

You are welcome to include this in your summary. Please don't hesitate 
to ask for clarifications!

Janne Bondi Johannessen

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list