<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Val,<br>
<br>
My 2 cents, SVG is the next step. It is required for quality
publishing of SignWriting documents with the IMWA. However, if done
right, it will be compatible with our current work so there is no
hurry. Unless someone else works on it first, I will get to it
eventually.<br>
<br>
For SVG we will need to convert every IMWA symbol from a static graphic
into a vector graphic. There are applications that may be able to do
the conversion automatically. However, we would need to verify every
symbol. We would then need to verify the current IMWA based signs.
Since the IMWA has around 26 thousand symbols, this could take a while.<br>
<br>
We do not need Unicode. I believe that Unicode could harm the IMWA if
done too soon.<br>
<br>
If we are concerned about document size, we need binary. Binary will
change the SSS-IDs from an 18 character string into a binary equivalent
using 1/6 the amount of data. <br>
<br>
Since this is a challenge for programmers, I'll get right down to the
bits and the SSS-ID numbers and explain why the SSS-ID numbers can not
be properly mapped onto Unicode. <br>
<br>
**** Warning, this is entirely too much information! ****<br>
<br>
What is an SSS-ID number?<br>
The SSS-ID number is a unique character string for every symbol of the
IMWA. The SSS-ID number has the format of "xx-xx-xxx-xx-xx-xx" where x
is a number from 0 to 9. The SSS-ID number has 6 parts (Catagory -
Group - Symbol - Variation - Fill - Rotation). If we look at the first
symbol of the IMWA this should make more sense.<br>
<br>
<img src="cid:part1.00010309.04070704@signpuddle.net" alt=""><br>
01-01-001-01-01-01<br>
<br>
Catagory 01<br>
Group 01<br>
Symbol 001<br>
Variation 01<br>
Fill 01<br>
Rotation 01<br>
<br>
What is a bit?<br>
A bit is 1 or 0. It is the smallest value a computer can work with.
It is called an on / off switch.<br>
<br>
1 bit can represent 2 values<br>
-------------------<br>
0<br>
1<br>
<br>
2 bits can represent 4 values<br>
--------------------<br>
00<br>
01<br>
10<br>
11<br>
<br>
3 bits can represent 8 values<br>
--------------------<br>
000<br>
001<br>
010<br>
011<br>
100<br>
101<br>
110<br>
111<br>
<br>
Basic ASCII uses 7 bits. 7 bits can represent 128 values (2^7 or
2*2*2*2*2*2*2). The letter A is 65, or "0100001" in binary.<br>
<br>
Unicode was designed with 16-bits. 16 bits can represent over 65
thousand values. Originally this was thought to be enough. It was
not. Unicode was extented to have multiple layers, but each layer
still only has 16-bits. <br>
<br>
The IMWA has around 26 thousand symbols. This should be able to fit on
one layer of Unicode (layer 3 would be perfect), however the IMWA is
still growing so this is a problem for encoding. If we squeeze the
symbols too close, we won't be able to add new symbols. If we don't
squeeze them close enough, we run out of room.<br>
<br>
Let's take a specific example to help clear this up.<br>
<br>
Here is the first symbol of the IMWA again.<br>
<img src="cid:part2.09020705.00000207@signpuddle.net" alt=""><br>
01-01-001-01-01-01<br>
<br>
If this symbol could be placed in Unicode, it would use 16 bits. Since
it is first in the alphabet, it would have the value of 1 or
"0000000000000001" in binary.<br>
<br>
If we store this symbol using the SSS-ID, we would use 18 characters
(01-01-001-01-01-01). Since each character uses 8 bits, we would be
using 144 bits. This is much bigger than 16 bits, but it is very clear.<br>
<br>
So we need a mapping from SSS-ID number to a specific number of bits.
Since the SSS-ID number is very regular, we can state a maximum number
of bits possible.<br>
<br>
Catagory - Group - Symbol - Variation - Fill - Rotation<br>
<br>
Every part of the SSS-ID uses 2 numbers except for the Symbol part
which uses 3 numbers. 99 is the largest value for 2 numbers which
would be covered by 7 bits (2^7 = 128). 999 is the largest value for 3
numbers which would be covered by 10 bits (2^10 = 1032). So....<br>
7 bits - 7 bits - 10 bits - 7 bits - 7 bits - 7 bits = 45 bits.<br>
<br>
If we analyze the current IMWA, we can get a max number for each
position in the SSS-ID number.<br>
<br>
Highest values in the current IMWA.<br>
Catagory - 8<br>
Group - 10<br>
Symbol - 50 <br>
Variation - 5<br>
Fill - 6<br>
Rotation - 16<br>
<br>
So a bit number optimized for the current IMWA would be...<br>
3 bits - 4 bits - 6 bits - 3 bits - 3 bits - 4 bits = 23 bits.<br>
<br>
So if we used 45 bits, we would never have a problem with new symbols
being added to the IMWA. And we could save half the space again if we
optimized the bits for the current IMWA.<br>
<br>
Unicode uses 16 bits so we would need an additional optimization to
squeeze the IMWA number system from 23 bits into 16 bits. However,
since the IMWA is still growing, we don't know where the new symbols
will show up. Since Unicode is not allowed to change once it has been
defined, any optimization could lead to potential problems. For that
reason, I think the 45 bit option would be prefered.<br>
<br>
And that's just for the symbols themselves. We still have the XY
coordinates and color for each symbol. But that's enough for now.<br>
<br>
-Steve<br>
<br>
<br>
Valerie Sutton wrote:
<blockquote
cite="mid8E16E542-4547-4384-BE73-6FEAF83F557D@signwriting.org"
type="cite">SignWriting List
<br>
June 21, 2005
<br>
<br>
<blockquote type="cite">On Jun 21, 2005, at 4:40 PM, Stuart Thiessen
wrote:
<br>
A clarification on this: I completely agree that SWML is a valuable
step to making SW searchable and easily transported. However, SWML as
such does not handle the display of SW, only the storage. So computer
software that reads SWML will have to use some kind of display process
to make the SW data visual. This display process could use SVG
images, PNG images, or a Unicode font to provide the displayed images
depending on the program. So, we need to separate the roles of SWML
and display. SWML only has to do with storage and retrieval of data,
but not display.
<br>
</blockquote>
<br>
I see. Thanks for explaining this to me! So when Steve is using SWML
to store data in SignPuddle, he is using PNGs to do the visual display
of what the SWML says should be displayed? I wasn't aware of this...I
am glad to know this...
<br>
<br>
<blockquote type="cite">Until SW is finally in Unicode, SW is just
graphics because that is the only display mechanism we have for SW.
The value of SWML is that we are now able to search it with a variety
of programs. SW- DOS by comparison probably could have been equally as
searchable but because of its binary format, that made it much more
difficult compared to SWML. But search capability and display
capabilities are two different "animals". The value of Unicode is
simply this: hearing people will probably not fully appreciate SW
until it is available in Unicode and it is able to be composed just
like spoken languages (in a manner of speaking). This is simply
because it takes much less room to store Unicode symbols than it does
to store graphic images. The display happens either way, but I'm
talking here more about "political" respect or the perceived reality
of SW's status as a genuine writing system.
<br>
</blockquote>
<br>
OK. What about SVG? I remember years ago, Antonio Carlos came to visit
me from Brazil, and was eager to explain both SWML and SVG to me...I
remember feeling amazed at the possibilities when he showed me a
SignWriting symbol being drawn on the web in front of my eyes in
SVG...Now that we see that SWML is really becoming important, I wonder
if SVG isn't next?
<br>
<br>
That does not mean that I don't think Unicode is a terrific idea...it
is just that Unicode takes money and time, and if PNG display is the
only alternative right now, then maybe SVG could be another
alternative until Unicode is available for SignWriting?
<br>
<br>
Did you know that the French have interest in developing a way to
apply SignWriting to Unicode? I wonder if Mr. Dalle and Mr. Aznar from
France wouldn't be interested in working with SIL on the Unicode
project? Do you think SIL could be interested?...
<br>
<br>
<blockquote type="cite">Also, the use of Unicode will not make SWML
obsolete. In fact, I think that SWML will be even more useful because
instead of having special code numbers in the markup, we can actually
embed the Unicode character for that SW symbol. This will make SWML
files more compact and more easily read and further enhance its
usefulness. But that is a little more down the road until funding and
resources become available. Once funding is available, we can
certainly begin work on it and then just wait on a final submission
until we feel the IMWA is more stable.
<br>
</blockquote>
<br>
I see. Very interesting, Stuart! You know so much! ;-)
<br>
<br>
Thanks for your patience with me and all those symbols in the
IMWA!...I actually am not necessarily in favor of placing the whole
IMWA into Unicode. I think we should do a Symbol-Frequency test on
dictionaries to pin down the symbols that you really are using, and
then use the Language-specific symbolset to be the first SignWriting
Unicode...in other words...Unicode US, Unicode NO, etc...based on only
those SignWriting symbols used in one language...why slow down the
Unicode development for SignWriting, just because DanceWriting has
not been entered into the IMWA yet? And is there really a Unicode for
music sounds? No. So why should DanceWriting be in Unicode?...Unicode
should be for SignWriting specific to one sign language...
<br>
<br>
Just a thought. I will leave Unicode development to you and the next
generation!
<br>
<br>
Val ;-)
<br>
<br>
<br>
</blockquote>
<br>
</body>
</html>