[sw-l] challenge for programmers

Stuart Thiessen sw at PASSITONSERVICES.ORG
Wed Jun 22 05:22:07 UTC 2005


Hi, Steve!  I agree that we need to see some more stability in IMWA 
before it is encoded into Unicode.  I have talked with Unicode experts 
and they inform me that 26,000+ symbols is nothing.  No problem at all! 
  Space is not an issue at this point from what they thought. The issue 
is a stable character set.  But we don't know if we will need to encode 
it as 26,000 symbols.  It may be that we just encode the 01 position 
and let the renderer take care of flips and rotations. That reduces it 
by about 16 symbols per handshape, right?  Or perhaps we let the 
renderer handle both rotations and fills.  That reduces it by about 48 
symbols per handshape, right? Who knows?  Those are questions that need 
to be studied so we can come up with the best solution.

You are right that a mapping between IMWA numbering and Unicode will be 
needed.  You have presented a good demonstration of the challenge in 
that regard. One point, though.  Color would not be encoded in Unicode. 
Color is not encoded for any other writing system. That would have to 
be handled outside of Unicode.  XY Coordinates could be handled in a 
number of different ways.  For example, in Unicode, for Chinese, it has 
some special placement characters that explain how to compose a Chinese 
character from other characters.  One placement marker might mean upper 
left or upper right or whatever.  A modification of that approach might 
be possible.  The challenge of course will be how to shield the user 
from having to worry about XY coordinates but simply input the SW text 
normally (whether through typing or drag and drop).  Perhaps we simply 
have a placement symbol for X and a placement symbol for Y and then the 
second character actually represents the coordinates as a 16-bit 
number. Or a placement character followed by a second character that 
represents both the X and Y as 2 8-bit numbers. As you can see, there 
are a variety of ways that it could happen. The question is which will 
work best.  That is the purpose of developing a team and the funding to 
be able to research all of these issues and perhaps even help Valerie 
with the process of stabilizing the IMWA so she doesn't have to do it 
alone.  She can do the brain work and the team can help with the grunt 
work.

We will have to see how things progress in making this real with 
funding and personnel.

Again, let me add that I don't see Unicode as competing with anything 
we have now.  I see it as complementing and supporting what we have now 
as well as giving us some political advocacy because we can announce 
that SignWriting has taken its place among the world's writing systems 
as an equal in its own right. That is a powerful statement.  So 
eventually, I think that is a very wise direction for SignWriting.  
That is not to say that PNG's or SVG's are bad or shouldn't be used.  
But it doesn't give us the political clout that SW in Unicode can give 
us in the eyes of the hearing.

Thanks,

Stuart

On Jun 22, 2005, at 2:33, Steve Slevinski wrote:

>  Hi Val,
>
>  My 2 cents, SVG is the next step.  It is required for quality 
> publishing of SignWriting documents with the IMWA.  However, if done 
> right, it will be compatible with our current work so there is no 
> hurry.  Unless someone else works on it first, I will get to it 
> eventually.
>
>  For SVG we will need to convert every IMWA symbol from a static 
> graphic into a vector graphic.  There are applications that may be 
> able to do the conversion automatically.  However, we would need to 
> verify every symbol.  We would then need to verify the current IMWA 
> based signs.  Since the IMWA has around 26 thousand symbols, this 
> could take a while.
>
>  We do not need Unicode.  I believe that Unicode could harm the IMWA 
> if done too soon.
>
>  If we are concerned about document size, we need binary.  Binary will 
> change the SSS-IDs from an 18 character string into a binary 
> equivalent using 1/6 the amount of data. 
>
>  Since this is a challenge for programmers, I'll get right down to the 
> bits and the SSS-ID numbers and explain why the SSS-ID numbers can not 
> be properly mapped onto Unicode. 
>
>  **** Warning, this is entirely too much information!  ****
>
>  What is an SSS-ID number?
>  The SSS-ID number is a unique character string for every symbol of 
> the IMWA.  The SSS-ID number has the format of "xx-xx-xxx-xx-xx-xx" 
> where x is a number from 0 to 9.  The SSS-ID number has 6 parts 
> (Catagory - Group - Symbol - Variation - Fill - Rotation).  If we look 
> at the first symbol of the IMWA this should make more sense.
>
> <unknown.jpg>
>  01-01-001-01-01-01
>
>  Catagory 01
>  Group  01
>  Symbol 001
>  Variation 01
>  Fill 01
>  Rotation 01
>
>  What is a bit?
>  A bit is 1 or 0.  It is the smallest value a computer can work with.  
> It is called an on / off switch.
>
>  1 bit can represent 2 values
>  -------------------
>  0
>  1
>
>  2 bits can represent 4 values
>  --------------------
>  00
>  01
>  10
>  11
>
>  3 bits can represent 8 values
>  --------------------
>  000
>  001
>  010
>  011
>  100
>  101
>  110
>  111
>
>  Basic ASCII uses 7 bits.  7 bits can represent 128 values (2^7 or 
> 2*2*2*2*2*2*2).  The letter A is 65, or "0100001" in binary.
>
>  Unicode was designed with 16-bits.  16 bits can represent over 65 
> thousand values.  Originally this was thought to be enough.  It was 
> not.  Unicode was extented to have multiple layers, but each layer 
> still only has 16-bits. 
>
>  The IMWA has around 26 thousand symbols.  This should be able to fit 
> on one layer of Unicode (layer 3 would be perfect), however the IMWA 
> is still growing so this is a problem for encoding.  If we squeeze the 
> symbols too close, we won't be able to add new symbols.  If we don't 
> squeeze them close enough, we run out of room.
>
>  Let's take a specific example to help clear this up.
>
>  Here is the first symbol of the IMWA again.
> <unknown.jpg>
>  01-01-001-01-01-01
>
>  If this symbol could be placed in Unicode, it would use 16 bits.  
> Since it is first in the alphabet, it would have the value of 1 or 
> "0000000000000001" in binary.
>
>  If we store this symbol using the SSS-ID, we would use 18 characters 
> (01-01-001-01-01-01).  Since each character uses 8 bits, we would be 
> using 144 bits.  This is much bigger than 16 bits, but it is very 
> clear.
>
>  So we need a mapping from SSS-ID number to a specific number of 
> bits.  Since the SSS-ID number is very regular, we can state a maximum 
> number of bits possible.
>
>  Catagory - Group - Symbol - Variation - Fill - Rotation
>
>  Every part of the SSS-ID uses 2 numbers except for the Symbol part 
> which uses 3 numbers.  99 is the largest value for 2 numbers which 
> would be covered by 7 bits (2^7 = 128).  999 is the largest value for 
> 3 numbers which would be covered by 10 bits (2^10 = 1032).  So....
>  7 bits - 7 bits - 10 bits - 7 bits - 7 bits - 7 bits = 45 bits.
>
>  If we analyze the current IMWA, we can get a max number for each 
> position in the SSS-ID number.
>
>  Highest values in the current IMWA.
>  Catagory - 8
>  Group - 10
>  Symbol - 50
>  Variation - 5
>  Fill - 6
>  Rotation - 16
>
>  So a bit number optimized for the current IMWA would be...
>  3 bits - 4 bits - 6 bits - 3 bits - 3 bits - 4 bits = 23 bits.
>
>  So if we used 45 bits, we would never have a problem  with new 
> symbols being added to the IMWA.  And we could save half the space 
> again if we optimized the bits for the current IMWA.
>
>  Unicode uses 16 bits so we would need an additional optimization to 
> squeeze the IMWA number system from 23 bits  into 16 bits.  However, 
> since the IMWA is still growing, we don't know where the new symbols 
> will show up.  Since Unicode is not allowed to change once it has been 
> defined, any optimization could lead to potential problems.  For that 
> reason, I think the 45 bit option would be prefered.
>
>  And that's just for the symbols themselves.  We still have the XY 
> coordinates and color for each symbol.  But that's enough for now.
>
>  -Steve
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 7866 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20050622/6674ae9b/attachment.bin>


More information about the Sw-l mailing list