[Corpora-List] Addendum to audiovisual corpora: Summary of responses
Paul Thompson
p.a.thompson at reading.ac.uk
Mon Dec 15 16:39:02 UTC 2008
I missed out one response to my request for information on audiovisual
corpora in the summary that I sent out a couple of hours ago, and I am
pastingt the e-mail message in below. My apologies to Janne Bondi
Johannessen - this was an oversight on my part,
Paul
From: *Janne Bondi Johannessen* <jannebj at gmail.com
<mailto:jannebj at gmail.com>>
Date: 2008/12/6
Subject: Re: [Corpora-List] Wanted: info on audio-visual corpora
At the University of Oslo, we created a general, audio-visual corpus for
the Oslo dialect of Norwegian in 2005, 900 000 words. It is transcribed,
POS tagged and has links to audio and video. Now, 2007-2009, we are
developing a Nordic (Scandinavian) dialect corpus of five Nordic
languages (Norwegian, Swedish, Danish, Icelandic and Faroese), with two
transcription sets: orthographic and phonetic. Here, too, the
transciptions are linked to audio and video. We also have other,
smaller corpora.
Both corpora are searchable via a user-friendly web-interface (called
Glossa) into CQP, and all kinds of strings and substrings (words,
affixes, combinations) can be used with all kinds of grammatical
categories. In addition, importantly, the searches can also use
variables such as gender, age, education, and place of dwelling. The
Nordic dialect corpus also allows searches where the user chooses
ortograhic or phonetic search strings, and likewise, the search results
can be shown in either transcription style. Finally, the user can search
through the whole corpus or choose either place or country for delmiting
the results.
We have developed a number of tools:
Glossa search interface
Semi-automatic dialect transliterator (to be used from phonetic to
standard orthography) for the various dialects
Taggers for the spoken Nordic languages using written language taggers
as starting point.
You are welcome to contact me for more information, or to try the Oslo
Corpus, or to read about it:
- Mini-demo of spoken Oslo Norwegian Corpus:
http://omilia.uio.no/glossa/html/index_dev.php?corpus=demo
(username and password: demo) (Interface only in Norwegian)
- Demo of Nordic Dialect Corpus:
http://omilia.uio.no/glossa/html/index_dev.php?corpus=scandiasyn
(Please contact me in a separate e-mail for username and password)
(Interface in English)
The Nordic Dialect Corpus:
http://www.tekstlab.uio.no/nota/scandiasyn/english.html
Norwegian Speech Corpora:
http://www.tekstlab.uio.no/nota/english/index.html#nota
Home page of the Text Laboratory:
http://www.hf.uio.no/tekstlab/English/index.html
Papers:
Johannessen, Janne Bondi; Nygaard, Lars; Priestley, Joel; Nøklestad,
Anders. 2008 .
Glossa: a Multilingual, Multimodal, Configurable User Interface. I:
Proceedings of the Sixth International Language Resources and Evaluation
(LREC'08). Paris: European Language Resources Association (ELRA) ISBN
2-9517408-4-0. s. -
http://www.lrec-conf.org/proceedings/lrec2008/
Johannessen, Janne Bondi; Hagen, Kristin; Priestley, Joel; Nygaard,
Lars. 2007.
An Advanced Speech Corpus for Norwegian. I: NODALIDA 2007 PROCEEDINGS.
Tartu: University of Tartu ISBN 978-9985-4-0513-0. s. 29-3
http://dspace.utlib.ee/dspace/handle/10062/2559
Johannessen, Janne Bondi; Hagen, Kristin; Priestley, Joel; Nygaard,
Lars. 2006
A Speech Corpus with Emotions. I: Workshop Proceedings: W09 Corpora for
Research on Emotions and Affect. LREC-2006. Pisa and Genova: Istituto di
Linguistica Computazionale del Consiglio Nazionale delle Ricerche
(ILC-CNR) s. 80-84
http://www.sdjt.si/bib/lrec06/ (Click on the workshop Corpora for
Research on Emotions and Affect)
Book:
Research from the Norwegian Corpus - in Norwegian, Swedish and Danish:
Johannessen, Janne Bondi; Hagen, Kristin. 2008
Om NoTa-korpuset og artiklene i denne boka. I: Språk i Oslo. Ny
forskning omkring talespråk.. Novus Forlag ISBN 978-82-7099-4
You are welcome to include this in your summary. Please don't hesitate
to ask for clarifications!
Janne Bondi Johannessen
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list