AW: Fixing a fundamental flaw in Binary SIgnWriting
Stefan Wöhrmann
stefanwoehrmann at GOOGLEMAIL.COM
Tue Jun 1 21:10:57 UTC 2010
Hi Steve, Valerie and everybody,
I do not understand these software discussions - so excuse this question.
Will this change effect the work we already have invested in the SignPuddle
online dictionary? Do I have to rewrite entries?
Thanks
Stefan ;-)
-----Ursprüngliche Nachricht-----
Von: SignWriting List: Read and Write Sign Languages
[mailto:SW-L at LISTSERV.VALENCIACC.EDU] Im Auftrag von Steve Slevinski
Gesendet: Dienstag, 1. Juni 2010 22:49
An: SW-L at LISTSERV.VALENCIACC.EDU
Betreff: Fixing a fundamental flaw in Binary SIgnWriting
Hi List,
This is a technical discussion. Nothing is going to change regarding
the writing system. The change is only data related.
Back in 2008, I made a poor design choice for Binary SignWriting. I
needed to define what was a character for the encoding model. I decided
that each symbol should be a character. Some others (Stuart Thiessen,
Michael Everson, members of the WLDC, ...) thought that each BaseSymbol
should be a character with an individual symbol being defined as a
BaseSymbol character with one or two modifying characters.
Encoding with symbol characters seemed the better choice, rather than
using 3 times the amount of data to say the same thing. I was wrong.
My choice made searching by BaseSymbol much more difficult. I was
forced to pre-process the data before I could search. This was wasted
effort. I realized the error of my ways when I was reading a discussion
of searching with Unicode.
I need to fix my poor design choice and reencode the ISWA 2010 with
BaseSymbol characters and modifiers. I then need to refactor the
character encoding model. This should be a quick fix I'll have ready by
Friday, but it changes BSW once again. Hopefully for the last time.
On the bright side, this makes it easier for inclusion in Unicode. With
my previous encoding, I required an entire Unicode plane of 65,000
characters. With the new encoding, I only need 1,280 characters. This
is a much better number.
Years ago, Michael Everson worked with Unicode for the tentative
acceptance of SignWriting into the standard. If you look at the Unicode
roadmap for the Supplementary Multilingual Plane, you'll see that Sutton
SignWriting has 4 rows set aside awaiting a proposal. These 4 rows
represent 1024 characters. With the new encoding, I can create a
proposal that requires 5 rows. Much more reasonable that an entire plane.
http://www.unicode.org/roadmaps/smp/
Sorry to any and all programmers / users this will inconvenience, but it
is a needed change.
Regards,
-Steve
More information about the Sw-l
mailing list