SignWriting and Unicode (was: the benefits of ELAN)
Albert Bickford
albert_bickford at sil.org
Mon May 5 18:10:43 UTC 2008
I changed the subject line because we've switched topics.
SVG won't provide a means of getting SignWriting into Unicode (including UTF-8), because Unicode by its nature requires a much simpler representation than SVG. Likewise for SignWriting Markup Language (SWML), another XML-based notation that is specifically for SignWriting. Such data formats are great for some purposes, but they are *much* too lengthy to be incorporated into Unicode, and their internal structure (based as they are on XML) is much too complex. Unicode is a system for representing a character in a very small number of bytes, on the order of 5-10, whereas SVG or SWML might need (I'm guessing) something more like 100-200 bytes per SignWriting character.
However, it is not necessarily the case that a Unicode implementation of SignWriting would need to make room for 33,000+ new symbols. That large number is a consequence of the fact that each separate rotation and shading (used to represent e.g. hand orientation and palm facing) is currently counted as a separate symbol. But, each handshape has 96 different symbols to represent it; if a Unicode system was developed in which each handshape was represented by only one "codepoint" (the technical term for the Unicode number that represents the character), and there was a second codepoint to represent the rotation and shading, then the number of codepoints required would drop to probably in the range of 500-1000. (There are also other symbols that vary systematically from each other, especially the movement arrows.) This approach would also make it much easier to expand the system later; when a new handshape needs to be added to the system, you only need to add one new character, not 96.
Unfortunately, my impression is that most of the people who have thought about SignWriting and Unicode have made the assumption that each of the 33,000+ new symbols needs to correspond one-to-one with a Unicode codepoint. As long as people make that assumption, it is unlikely that SignWriting will ever be added to Unicode.
Albert
----- Original Message -----
From: Gerard Meijssen
To: A list for linguists interested in signed languages
Cc: Valerie Sutton
Sent: Monday, May 05, 2008 8:02 AM
Subject: Re: [SLLING-L] the benefits of ELAN
Hoi,
I blogged a few days ago about SignWriting and the availability of the SignWriting Image Server. What is clear in the text is that they are going to apply for an Internet draft in order to improve the SWIS support. What I understand is that SignWriting is superior in relation to HamNoSys as it allows for the registration of facial expressions.
I would love to see SignWriting to be included in UTF-8. There are two problems; there are 33563 symbols and this requires more then the pages reserved for it. The hope is that with SWIS it may be possible to speed the inclusion dramatically because of the way the technology behind it ... (SVG).
Thanks,
Gerard
On Fri, May 2, 2008 at 8:03 PM, Albert Bickford <albert_bickford at sil.org> wrote:
In theory one should be able to use any appropriate transcription system in one of ELAN's tiers. That is, once you've selected a section of video, you can create an annotation for that stretch in a particular tier, and then put a transcription of the data in that stretch into the annotation. The catch here is that the transcription must be representable using a font that ELAN knows how to use. I'm not up on all the current technical details, but my understand/assumption has been that ELAN only supports transcription systems that are supported in Unicode (and, last I checked, there was the additional requirement that the Unicode characters you needed had to be included in one particular font). Neither SignWriting nor HamNoSys have been added to Unicode, although there are various people who have given thought to how this might be done. (In the case of SignWriting, it's not a trivial question what is the best way to do so.)
Now, if there were some way to attach a graphic image as an annotation in ELAN, then this would provide a workaround, because the graphic could contain an image of the SignWriting or HamNoSys transcription. This would provide a human-readable annotation, although it would not be searchable. It would also probably be much more cumbersome than just typing in a transcription.
If one wants annotations in ELAN that can provide some of the functions that a transcription would provide, then I think the main options are:
-- a system of glosses that identifies the signs, as mentioned by Louise de Beuzeville in a separate post
- a transcription system that is representable in Unicode, e.g. StokoeAscii (an adaptation of Stokoe notation that is representable with ordinary ASCII characters)
- a coding system that represents just the phonetic/phonological/morphological details that you are interested in
Albert Bickford
----- Original Message -----
From: Gerard Meijssen
To: A list for linguists interested in signed languages
Sent: Friday, May 02, 2008 5:46 AM
Subject: Re: [SLLING-L] the benefits of ELAN
Is this where SingWriting / HamNoSys come in by associating it with a time slot in ELIAN ?
_______________________________________________
SLLING-L mailing list
SLLING-L at majordomo.valenciacc.edu
http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
------------------------------------------------------------------------------
_______________________________________________
SLLING-L mailing list
SLLING-L at majordomo.valenciacc.edu
http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/slling-l/attachments/20080505/35a8f461/attachment.htm>
-------------- next part --------------
_______________________________________________
SLLING-L mailing list
SLLING-L at majordomo.valenciacc.edu
http://majordomo.valenciacc.edu/mailman/listinfo/slling-l
More information about the Slling-l
mailing list