AW: Fixing a fundamental flaw in Binary SIgnWriting

Stefan Wöhrmann stefanwoehrmann at GOOGLEMAIL.COM
Tue Jun 1 21:10:57 UTC 2010


Hi Steve, Valerie and everybody, 

I do not understand these software discussions - so excuse this question.
Will this change effect the work we already have invested in the SignPuddle
online dictionary? Do I have to rewrite entries? 

Thanks 
Stefan ;-)

-----Ursprüngliche Nachricht-----
Von: SignWriting List: Read and Write Sign Languages
[mailto:SW-L at LISTSERV.VALENCIACC.EDU] Im Auftrag von Steve Slevinski
Gesendet: Dienstag, 1. Juni 2010 22:49
An: SW-L at LISTSERV.VALENCIACC.EDU
Betreff: Fixing a fundamental flaw in Binary SIgnWriting

Hi List,

This is a technical discussion.  Nothing is going to change regarding 
the writing system.  The change is only data related.

Back in 2008, I made a poor design choice for Binary SignWriting.   I 
needed to define what was a character for the encoding model.  I decided 
that each symbol should be a character.  Some others (Stuart Thiessen, 
Michael Everson, members of the WLDC, ...) thought that each BaseSymbol 
should be a character with an individual symbol being defined as a 
BaseSymbol character with one or two modifying characters.

Encoding with symbol characters seemed the better choice, rather than 
using 3 times the amount of data to say the same thing.  I was wrong.  
My choice made searching by BaseSymbol much more difficult.  I was 
forced to pre-process the data before I could search.  This was wasted 
effort.  I realized the error of my ways when I was reading a discussion 
of searching with Unicode.

I need to fix my poor design choice and reencode the ISWA 2010 with 
BaseSymbol characters and modifiers.  I then need to refactor the 
character encoding model.  This should be a quick fix I'll have ready by 
Friday, but it changes BSW once again.  Hopefully for the last time.

On the bright side, this makes it easier for inclusion in Unicode.  With 
my previous encoding, I required an entire Unicode plane of 65,000 
characters.  With the new encoding, I only need 1,280 characters.  This 
is a much better number. 

Years ago, Michael Everson worked with Unicode for the tentative 
acceptance of SignWriting into the standard.  If you look at the Unicode 
roadmap for the Supplementary Multilingual Plane, you'll see that Sutton 
SignWriting has 4 rows set aside awaiting a proposal.  These 4 rows 
represent 1024 characters.  With the new encoding, I can create a 
proposal that requires 5 rows.  Much more reasonable that an entire plane.
http://www.unicode.org/roadmaps/smp/

Sorry to any and all programmers / users this will inconvenience, but it 
is a needed change.

Regards,
-Steve



More information about the Sw-l mailing list