Sign language corpora
Trevor Johnston
trevor.a.johnston at bigpond.com
Tue Sep 18 11:07:20 UTC 2007
Yes, we have two corpora in Australia. One is a corpus from a project by
Adam Schembri and myself to study sociolinguistic variation (modelled on the
approach taken by Ceil Lucas and colleagues for ASL). The second, is a
corpus project as explained by Adam Schembri in his last posting (other
details can be gleaned from the website mentioned by Inge Zwitserlood:
http://www.let.kun.nl/sign-lang/corpusngt/scientific/index.html). The second
corpus was collected between 2004 and 2006 and will be deposited with the
Endangered Languages Documentation Program, SOAS, University of London, as
part of their endangered languages archive at the end of this year or very
early next. Full details of the project will be available on the Auslan
Signbank site, which is currently being updated and migrated to a new host,
at the time it is deposited.
The corpus will consist of over 100 hours of digital movies, and associated
ELAN annotation files. Annotators have been working on the corpus already
for two years (and will for the next 10!). The archive is intended to be
internet accessible (but there will be an initial period of restricted
access).
I¹d like to add a word of caution: a corpus is not just a collection of
videos (digital or otherwise). There is a lot more to it than that. If it is
not machine readable in some way (hence ELAN) it is not a corpus in the
sense meant by linguists today and simply making recordings, without
annotations, would not advance empirical signed language research greatly.
Trevor
--
A/Prof. Trevor Johnston, DLitt, PhD, BA | Signed Languages & Linguistics
___________________________________________________________
Department of Linguistics, C5A 526
Macquarie University, Sydney
NSW Australia 2109
___________________________________________________________
From: Adam C Schembri <a.schembri at ucl.ac.uk>
Reply-To: A list for linguists interested in signed languages
<slling-l at majordomo.valenciacc.edu>
Date: Tue, 18 Sep 2007 11:27:34 +0100
To: <trevor.a.johnston at bigpond.com>, A list for linguists interested in
signed languages <slling-l at majordomo.valenciacc.edu>
Subject: Re: [SLLING-L] Sign language corpora
There are many of us who follow this particular 'gospel'. :-) I have
jokingly referred to it as the 'gospel according to Trevor and Ceil'. Ceil
Lucas and her colleagues were perhaps the first to systematically collect a
naturalistic corpus of sign language data balanced for
age/gender/region/ethnicity etc (Lucas, Bayley & Valli, 2001), and Trevor
Johnston and colleagues were the first - to my knowledge - to actually begin
to build a 'corpus' in the contemporary sense of the term (i.e., a
machine-readable, annotated collection of language recordings), filming 3
hour data collection sessions from 100 native and near-native signers in 5
regions across Australia. The NGT project has taken this further by building
in web-accessibility of their corpus into their project, but I believe the
Australian team do hope to make the Auslan corpus more widely available at a
later stage. Certainly, the new project that Inge refers to here in the UK
plans to do something similar to the NGT project ( for those of you who
don't know, my colleagues and I have just been awarded a major £1.2 million
grant from the Economic and Social Research Council for the 'British Sign
Language Corpus Project'. For more information, visit DCAL's news
page: http://www.dcal.ucl.ac.uk/news/news.html ).
Adam
Adam C Schembri, PhD
Senior Research Fellow
Deafness, Cognition and Language (DCAL) Research Centre
University College London
49 Gordon Square
London WC1H0PD
United Kingdom
Tel: +44 20 7679 8680
http://www.dcalucl.ac.uk/team/adam_schembri.html
<http://www.dcal.ucl.ac.uk/team/adam_schembri.html>
On 18 Sep 2007, at 08:04, I.E.P. Zwitserlood wrote:
> Talking about gospels: so it is ours!
> In the Netherlands, at the Radboud University Nijmegen, a corpus is currently
> being compiled for NGT (Sign Language of the Netherlands). My collegues and I
> aim at recording 75 hours of elicited and (semi-)spontaneous data, collected
> from 100 native signers. All video data, as well as a translation and (for a
> small subset of the data) an annotation, will be made available on internet.
> (Similar projects have been/will be undertoken in Australia, the UK and
> Ireland, although the data are not so easily available). If anyone is
> interested in making a corpus for his/her sign language, we'll be happy to
> inform/support you with our experiences. For more information, see our
> website:
> http://www.let.kun.nl/sign-lang/corpusngt/scientific/index.html
>
> Best,
> Inge Zwitserlood
>
> ----- Original Message -----
> From: Dan Parvaz <dparvaz at gmail.com>
> Date: Monday, September 17, 2007 6:30 pm
> Subject: Re: [SLLING-L] An avator doing bfi
>
> Sorry, but I can't stop going on about corpora -- it's the gospel I preach :-)
>
> Perhaps the best way to kick-start this is to round up all the usual suspects,
> and get a governmental agency (US or EU, it doesn't much matter to me) to
> coordinate recording and transcribing 50 hours of data for everyone to use (I
> know, it isn't enough by spoken-language standards, but it's so much more
> than we've ever had). Then we have a fighting chance of pushing the state of
> the art in all these areas...
>
> -Dan.
>
> Sorry, but I can't stop going on about corpora -- it's the gospel I preach
> :-)
>
> Perhaps the best way to kick-start this is to round up all the usual suspects,
> and get a governmental agency (US or EU, it doesn't much matter to me) to
> coordinate recording and transcribing 50 hours of data for everyone to use (I
> know, it isn't enough by spoken-language standards, but it's so much more
> than we've ever had). Then we have a fighting chance of pushing the state of
> the art in all these areas...
>
> -Dan.
>
> On 9/17/07, Sara Morrissey <sara.morrissey2 at mail.dcu.ie> wrote:
>>
>> Oh dear. Don't talk to me about corpora! I'm working in the arena of
>> Data-Driven Machine Translation and working with people who have millions of
>> sentences for their spoken language translation in comparison to my 600 for
>> sign language work!! Finding parallel data within a closed domain is a
>> difficult task. Nevertheless progress is being made and results are promising
>> :)
>>
>>
>>
>> Thanks for your input :o)
>> Sara
>>
>>
>>
>> On 17/09/2007, Dan Parvaz <dparvaz at gmail.com > wrote:
>>> I'm sure the one thing standing between the Tunisian Deaf Community and
>>> achieving their potential is the lack of a signing avatar :-) Still, it is
>>> potentially cool research with good dividends, particularly if it means the
>>> development of a real Tunisian SL dictionary (as opposed to the previous
>>> effort, which was a glossary meant to contribute to the perennial Pan-Arab
>>> SL movement), grammar, etc.
>>>
>>> A major chunk of the problem here rests with the lack of substantial corpora
>>> of any kind, let alone parallel corpora.
>>>
>>> -Dan.
>>>
>>>
>>>
>>>
>>>
>>> On 9/17/07, Sara Morrissey <sara.morrissey2 at mail.dcu.ie
>>> <mailto:sara.morrissey2 at mail.dcu.ie> > wrote:
>>>>
>>>> All work in this area is a long way from being a translation service, I can
>>>> assure you of that following 3 years PhD research on the topic of Machine
>>>> Translation of Sign Languages. Sadly most of the work that I've come across
>>>> in this area is similar to the work described in the BBC article in that it
>>>> is just a small project. I have seen very little consistant work in this
>>>> area with most of it being satellite projects related to other work so it
>>>> never gets very far. Also, sadly, many groups that work in this area have
>>>> little to no knowledge of the languages they are dealing with and often
>>>> little contact with Deaf communities or colleagues and are more interested
>>>> in the computing aspects. I am aware of the forthcoming FP7 project which
>>>> does seem to intend spending a few years of research in this area:
>>>> http://www.ideal-ist.net/Countries/TN/PS-TN-1590 Well, I hope so at least,
>>>> I've applied for a postdoc position with them!!
>>>>
>>>>
>>>>
>>>> I'd be interested in hearing anyone's opinion on both this project and any
>>>> other sign language machine translation projects they've come across. I
>>>> intend to continue working in this area so all input is valuable :o)
>>>>
>>>>
>>>>
>>>> Namaste,
>>>>
>>>> Sara
>>>>
>>>>
>>>>
>>>> ************************************
>>>>
>>>> Sara Morrissey,
>>>>
>>>> PhD Researcher,
>>>>
>>>> National Centre for Language Technology,
>>>>
>>>> School of Computing,
>>>>
>>>> Dublin City University,
>>>>
>>>> Dublin 9,
>>>>
>>>> Ireland.
>>>>
>>>> ***********************************
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 15/09/2007, Dan Parvaz <dparvaz at gmail.com > wrote:
>>>>> Sigh. Everytime some student on their Amazing Journey Of
>>>>> Self-Discovery<tm> "reinvents" a piece of deaf-related technology
>>>>> (datagloves for reading fingerspelling, signing avatars, etc.), some
>>>>> ignorant journalist is ready to hail it as a breakthrough.
>>>>>
>>>>> This was put together in a few months by a student intern. As far as I can
>>>>> tell (those knowing BSL please look at the video and correct me if I'm
>>>>> wrong), this is yet another relatively straightforward marriage of speech
>>>>> recognition and 3D animation. There's no indication that space,
>>>>> classifiers, etc. which would be part of a natural SL are being used here.
>>>>> As it stands, it's less useful than commercially available speech-to-text
>>>>> systems (DragonDictate, Via Voice, etc.)
>>>>>
>>>>> Don't surplus your interpreters just yet :-)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> -Dan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 9/15/07, GerardM < gerard.meijssen at gmail.com
>>>>> <mailto:gerard.meijssen at gmail.com> > wrote:
>>>>>
>>>>>>
>>>>>> Hoi,
>>>>>> I read this article on the BBC website about a translation service
>>>>>> created by IBM that uses an avatar to translate into British Sign
>>>>>> language (bfi). Such technology could in principle also produce
>>>>>> SignWriting
>>>>>> Thanks,
>>>>>> Gerard
>>>>>>
>>>>>> http://news.bbc.co.uk/2/hi/technology/6993326.stm
>>>>>>
>>>>>> _______________________________________________
>>>>>> SLLING-L mailing list
>>>>>> SLLING-L at majordomo.valenciacc.edu
>>>>>> <mailto:SLLING-L at majordomo.valenciacc.edu>
>>>>>> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
>>>>>> <http://majordomo.valenciacc.edu/mailman/listinfo/slling-l>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> SLLING-L mailing list
>>>>> SLLING-L at majordomo.valenciacc.edu
>>>>> <mailto:SLLING-L at majordomo.valenciacc.edu>
>>>>> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
>>>>> <http://majordomo.valenciacc.edu/mailman/listinfo/slling-l>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Blessed are the flexible, for they shall not be bent out of shape.
>>>> _______________________________________________
>>>> SLLING-L mailing list
>>>> SLLING-L at majordomo.valenciacc.edu
>>>> <mailto:SLLING-L at majordomo.valenciacc.edu>
>>>> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
>>>> <http://majordomo.valenciacc.edu/mailman/listinfo/slling-l>
>>>>
>>>
>>>
>>> _______________________________________________
>>> SLLING-L mailing list
>>> SLLING-L at majordomo.valenciacc.edu
>>> <mailto:SLLING-L at majordomo.valenciacc.edu>
>>> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
>>> <http://majordomo.valenciacc.edu/mailman/listinfo/slling-l>
>>>
>>
>>
>>
>> --
>> Blessed are the flexible, for they shall not be bent out of shape.
>>
>> _______________________________________________
>> SLLING-L mailing list
>> SLLING-L at majordomo.valenciacc.edu <mailto:SLLING-L at majordomo.valenciacc.edu>
>> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
>>
>
>
> _______________________________________________
> SLLING-L mailing list
> SLLING-L at majordomo.valenciacc.edu
> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
> _______________________________________________
> SLLING-L mailing list
> SLLING-L at majordomo.valenciacc.edu
> http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
>
_______________________________________________
SLLING-L mailing list
SLLING-L at majordomo.valenciacc.edu
http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/slling-l/attachments/20070918/225397a4/attachment.htm>
-------------- next part --------------
_______________________________________________
SLLING-L mailing list
SLLING-L at majordomo.valenciacc.edu
http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
More information about the Slling-l
mailing list