<HTML>
<HEAD>
<TITLE>Re: [SLLING-L] Sign language corpora</TITLE>
</HEAD>
<BODY>
<FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'>Yes, we have two corpora in Australia. One is a corpus from a project by Adam Schembri and myself to study sociolinguistic variation (modelled on the approach taken by Ceil Lucas and colleagues for ASL). The second, is a corpus project as explained by Adam Schembri in his last posting (other details can be gleaned from the website mentioned by Inge Zwitserlood: <FONT COLOR="#0000FF"><U><a href="http://www.let.kun.nl/sign-lang/corpusngt/scientific/index.html">http://www.let.kun.nl/sign-lang/corpusngt/scientific/index.html</a></U></FONT>). The second corpus was collected between 2004 and 2006 and will be deposited with the Endangered Languages Documentation Program, SOAS, University of London, as part of their endangered languages archive at the end of this year or very early next. Full details of the project will be available on the Auslan Signbank site, which is currently being updated and migrated to a new host, at the time it is deposited. <BR>
<BR>
The corpus will consist of over 100 hours of digital movies, and associated ELAN annotation files. Annotators have been working on the corpus already for two years (and will for the next 10!). The archive is intended to be internet accessible (but there will be an initial period of restricted access).<BR>
<BR>
I’d like to add a word of caution: a corpus is not just a collection of videos (digital or otherwise). There is a lot more to it than that. If it is not machine readable in some way (hence ELAN) it is not a corpus in the sense meant by linguists today and simply making recordings, without annotations, would not advance empirical signed language research greatly.<BR>
<BR>
Trevor<BR>
<BR>
--<BR>
</SPAN><FONT COLOR="#2F2F2F"><FONT SIZE="2"><SPAN STYLE='font-size:10.0px'><B>A/Prof. Trevor Johnston, DLitt, PhD, BA</B> | </SPAN></FONT></FONT><FONT SIZE="2"><SPAN STYLE='font-size:10.0px'><FONT COLOR="#7B7B7B">Signed Languages & Linguistics<BR>
</FONT><FONT COLOR="#7C7C7C">_________________</FONT><FONT COLOR="#7B7B7B">__________________________________________</FONT></SPAN></FONT><SPAN STYLE='font-size:12.0px'> <BR>
</SPAN><FONT COLOR="#7C7C7C"><FONT SIZE="2"><SPAN STYLE='font-size:10.0px'><B>Department of Linguistics, C5A 526<BR>
Macquarie University, Sydney<BR>
</B>NSW Australia 2109<BR>
</SPAN></FONT></FONT><FONT SIZE="2"><SPAN STYLE='font-size:10.0px'><FONT COLOR="#7B7B7B">_________________</FONT><FONT COLOR="#7A7A7A">__________________________________________</FONT></SPAN></FONT><SPAN STYLE='font-size:12.0px'> <BR>
<BR>
<BR>
<HR ALIGN=CENTER SIZE="3" WIDTH="95%"><B>From: </B>Adam C Schembri <a.schembri@ucl.ac.uk><BR>
<B>Reply-To: </B>A list for linguists interested in signed languages <slling-l@majordomo.valenciacc.edu><BR>
<B>Date: </B>Tue, 18 Sep 2007 11:27:34 +0100<BR>
<B>To: </B><trevor.a.johnston@bigpond.com>, A list for linguists interested in signed languages <slling-l@majordomo.valenciacc.edu><BR>
<B>Subject: </B>Re: [SLLING-L] Sign language corpora<BR>
<BR>
There are many of us who follow this particular 'gospel'. :-) I have jokingly referred to it as the 'gospel according to Trevor and Ceil'. Ceil Lucas and her colleagues were perhaps the first to systematically collect a naturalistic corpus of sign language data balanced for age/gender/region/ethnicity etc (Lucas, Bayley & Valli, 2001), and Trevor Johnston and colleagues were the first - to my knowledge - to actually begin to build a 'corpus' in the contemporary sense of the term (i.e., a machine-readable, annotated collection of language recordings), filming 3 hour data collection sessions from 100 native and near-native signers in 5 regions across Australia. The NGT project has taken this further by building in web-accessibility of their corpus into their project, but I believe the Australian team do hope to make the Auslan corpus more widely available at a later stage. Certainly, the new project that Inge refers to here in the UK plans to do something similar to the NGT project ( for those of you who don't know, my colleagues and I have just been awarded a major £1.2 million grant from the Economic and Social Research Council for the 'British Sign Language Corpus Project'. For more information, visit DCAL's news page: <a href="http://www.dcal.ucl.ac.uk/news/news.html">http://www.dcal.ucl.ac.uk/news/news.html</a> ).<BR>
<BR>
Adam<BR>
<BR>
<BR>
Adam C Schembri, PhD<BR>
Senior Research Fellow<BR>
Deafness, Cognition and Language (DCAL) Research Centre <BR>
University College London<BR>
49 Gordon Square<BR>
London WC1H0PD<BR>
United Kingdom<BR>
Tel: +44 20 7679 8680<BR>
<a href="http://www.dcalucl.ac.uk/team/adam_schembri.html">http://www.dcalucl.ac.uk/team/adam_schembri.html</a> <a href="http://www.dcal.ucl.ac.uk/team/adam_schembri.html"><http://www.dcal.ucl.ac.uk/team/adam_schembri.html></a> <BR>
<BR>
<BR>
<BR>
<BR>
On 18 Sep 2007, at 08:04, I.E.P. Zwitserlood wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'>Talking about gospels: so it is ours! <BR>
In the Netherlands, at the Radboud University Nijmegen, a corpus is currently being compiled for NGT (Sign Language of the Netherlands). My collegues and I aim at recording 75 hours of elicited and (semi-)spontaneous data, collected from 100 native signers. All video data, as well as a translation and (for a small subset of the data) an annotation, will be made available on internet. (Similar projects have been/will be undertoken in Australia, the UK and Ireland, although the data are not so easily available). If anyone is interested in making a corpus for his/her sign language, we'll be happy to inform/support you with our experiences. For more information, see our website: <BR>
<a href="http://www.let.kun.nl/sign-lang/corpusngt/scientific/index.html">http://www.let.kun.nl/sign-lang/corpusngt/scientific/index.html</a><BR>
<BR>
Best,<BR>
Inge Zwitserlood<BR>
<BR>
----- Original Message -----<BR>
From: Dan Parvaz <dparvaz@gmail.com><BR>
Date: Monday, September 17, 2007 6:30 pm<BR>
Subject: Re: [SLLING-L] An avator doing bfi<BR>
<BR>
Sorry, but I can't stop going on about corpora -- it's the gospel I preach :-)<BR>
<BR>
Perhaps the best way to kick-start this is to round up all the usual suspects, and get a governmental agency (US or EU, it doesn't much matter to me) to coordinate recording and transcribing 50 hours of data for everyone to use (I know, it isn't enough by spoken-language standards, but it's so much more than we've ever had). Then we have a fighting chance of pushing the state of the art in all these areas... <BR>
<BR>
-Dan.<BR>
<BR>
Sorry, but I can't stop going on about corpora -- it's the gospel I preach :-)<BR>
<BR>
Perhaps the best way to kick-start this is to round up all the usual suspects, and get a governmental agency (US or EU, it doesn't much matter to me) to coordinate recording and transcribing 50 hours of data for everyone to use (I know, it isn't enough by spoken-language standards, but it's so much more than we've ever had). Then we have a fighting chance of pushing the state of the art in all these areas... <BR>
<BR>
-Dan.<BR>
<BR>
On 9/17/07, <B>Sara Morrissey</B> <sara.morrissey2@mail.dcu.ie> wrote:<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'> <BR>
Oh dear. Don't talk to me about corpora! I'm working in the arena of Data-Driven Machine Translation and working with people who have millions of sentences for their spoken language translation in comparison to my 600 for sign language work!! Finding parallel data within a closed domain is a difficult task. Nevertheless progress is being made and results are promising :) <BR>
<BR>
<BR>
<BR>
Thanks for your input :o)<BR>
Sara<BR>
<BR>
<BR>
<BR>
On 17/09/2007, <B>Dan Parvaz</B> <dparvaz@gmail.com > wrote: <BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'>I'm sure the one thing standing between the Tunisian Deaf Community and achieving their potential is the lack of a signing avatar :-) Still, it is potentially cool research with good dividends, particularly if it means the development of a real Tunisian SL dictionary (as opposed to the previous effort, which was a glossary meant to contribute to the perennial Pan-Arab SL movement), grammar, etc. <BR>
<BR>
A major chunk of the problem here rests with the lack of substantial corpora of any kind, let alone parallel corpora.<BR>
<BR>
-Dan. <BR>
<BR>
<BR>
<BR>
<BR>
<BR>
On 9/17/07, <B>Sara Morrissey</B> <sara.morrissey2@mail.dcu.ie <a href="mailto:sara.morrissey2@mail.dcu.ie"><mailto:sara.morrissey2@mail.dcu.ie></a> > wrote: <BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'> <BR>
All work in this area is a long way from being a translation service, I can assure you of that following 3 years PhD research on the topic of Machine Translation of Sign Languages. Sadly most of the work that I've come across in this area is similar to the work described in the BBC article in that it is just a small project. I have seen very little consistant work in this area with most of it being satellite projects related to other work so it never gets very far. Also, sadly, many groups that work in this area have little to no knowledge of the languages they are dealing with and often little contact with Deaf communities or colleagues and are more interested in the computing aspects. I am aware of the forthcoming FP7 project which does seem to intend spending a few years of research in this area: <a href="http://www.ideal-ist.net/Countries/TN/PS-TN-1590">http://www.ideal-ist.net/Countries/TN/PS-TN-1590</a> Well, I hope so at least, I've applied for a postdoc position with them!! <BR>
<BR>
<BR>
<BR>
I'd be interested in hearing anyone's opinion on both this project and any other sign language machine translation projects they've come across. I intend to continue working in this area so all input is valuable :o) <BR>
<BR>
<BR>
<BR>
Namaste,<BR>
<BR>
Sara<BR>
<BR>
<BR>
<BR>
************************************<BR>
<BR>
Sara Morrissey,<BR>
<BR>
PhD Researcher,<BR>
<BR>
National Centre for Language Technology,<BR>
<BR>
School of Computing,<BR>
<BR>
Dublin City University,<BR>
<BR>
Dublin 9,<BR>
<BR>
Ireland.<BR>
<BR>
***********************************<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
On 15/09/2007, <B>Dan Parvaz</B> <dparvaz@gmail.com > wrote: <BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'>Sigh. Everytime some student on their Amazing Journey Of Self-Discovery<tm> "reinvents" a piece of deaf-related technology (datagloves for reading fingerspelling, signing avatars, etc.), some ignorant journalist is ready to hail it as a breakthrough. <BR>
<BR>
This was put together in a few months by a student intern. As far as I can tell (those knowing BSL please look at the video and correct me if I'm wrong), this is yet another relatively straightforward marriage of speech recognition and 3D animation. There's no indication that space, classifiers, etc. which would be part of a natural SL are being used here. As it stands, it's less useful than commercially available speech-to-text systems (DragonDictate, Via Voice, etc.) <BR>
<BR>
Don't surplus your interpreters just yet :-)<BR>
<BR>
Cheers,<BR>
<BR>
-Dan<BR>
<BR>
<BR>
<BR>
<BR>
On 9/15/07, <B>GerardM</B> < gerard.meijssen@gmail.com <a href="mailto:gerard.meijssen@gmail.com"><mailto:gerard.meijssen@gmail.com></a> > wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'> <BR>
Hoi,<BR>
I read this article on the BBC website about a translation service created by IBM that uses an avatar to translate into British Sign language (bfi). Such technology could in principle also produce SignWriting <BR>
Thanks, <BR>
Gerard<BR>
<BR>
<a href="http://news.bbc.co.uk/2/hi/technology/6993326.stm">http://news.bbc.co.uk/2/hi/technology/6993326.stm</a><BR>
<BR>
_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu <a href="mailto:SLLING-L@majordomo.valenciacc.edu"><mailto:SLLING-L@majordomo.valenciacc.edu></a> <BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a> <a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l"><http://majordomo.valenciacc.edu/mailman/listinfo/slling-l></a> <BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>
<BR>
_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu <a href="mailto:SLLING-L@majordomo.valenciacc.edu"><mailto:SLLING-L@majordomo.valenciacc.edu></a> <BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a> <a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l"><http://majordomo.valenciacc.edu/mailman/listinfo/slling-l></a> <BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>
<BR>
<BR>
-- <BR>
Blessed are the flexible, for they shall not be bent out of shape. <BR>
_______________________________________________<BR>
SLLING-L mailing list <BR>
SLLING-L@majordomo.valenciacc.edu <a href="mailto:SLLING-L@majordomo.valenciacc.edu"><mailto:SLLING-L@majordomo.valenciacc.edu></a> <BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a> <a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l"><http://majordomo.valenciacc.edu/mailman/listinfo/slling-l></a> <BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>
<BR>
_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu <a href="mailto:SLLING-L@majordomo.valenciacc.edu"><mailto:SLLING-L@majordomo.valenciacc.edu></a> <BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a> <a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l"><http://majordomo.valenciacc.edu/mailman/listinfo/slling-l></a> <BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>
<BR>
<BR>
-- <BR>
Blessed are the flexible, for they shall not be bent out of shape. <BR>
<BR>
_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu <a href="mailto:SLLING-L@majordomo.valenciacc.edu"><mailto:SLLING-L@majordomo.valenciacc.edu></a> <BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'> <BR>
<BR>
_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu<BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a><BR>
_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu<BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>
<BR>
<HR ALIGN=CENTER SIZE="3" WIDTH="95%">_______________________________________________<BR>
SLLING-L mailing list<BR>
SLLING-L@majordomo.valenciacc.edu<BR>
<a href="http://majordomo.valenciacc.edu/mailman/listinfo/slling-l">http://majordomo.valenciacc.edu/mailman/listinfo/slling-l</a><BR>
</SPAN></FONT>
</BODY>
</HTML>