<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=windows-1252"
http-equiv="Content-Type">
</head>
<body alink="#ee0000" link="#0000ee" text="#000000" vlink="#551a8b">
Hi all,<br>
<br>
if there is 8 categories, 10 groups, 50 symbols, 5 variations, 6 fills
and 16 rotations, it makes 8*10*50*5*6*16 = 1,900,000 possible
combinations. But in IMWA there is not so many combinations used.<br>
<br>
Let's see on the number of combinations depending on number of bits...<br>
<br>
16 bits has 2^16 = 65,536 combinations<br>
HERE is SSS-ID without rotation = 120,000 combinations<br>
17 bits has 2^17 = 131072 combinations<br>
18 bits has 2^18 = 262,144 combinations<br>
19 bits has 2^19 = 525,632 combinations<br>
20 bits has 2^20 = 1,048,576 combinations<br>
HERE is complete SSS-ID = 1,900,000 combinations<br>
21 bits has 2^21 = 2,097,152 combinations<br>
<br>
Well, it is impossible to put all 1,900,000 combinations into 16 bits.
We need 21 bits.<br>
But if we don't use rotation, there is only 120,000 combinations and it
can be in 17 bits and it is very close.<br>
<br>
OPTIMIZATION<br>
There are combinations of SSS-ID which is never used, for example
SSS-ID 01-01-042-02-03-14. It is one of the 1,900,000's combination but
there is not more than 13 symbols in category 1 and group 1.<br>
<br>
There is a way to make an optimization if you know, how many groups is
in each category...<br>
<br>
category 1 has 10 groups<br>
category 2 has 10 groups<br>
category 3 has 10 groups<br>
category 4 has 5 groups<br>
category 5 has 5 groups<br>
category 6 has 2 groups<br>
category 7 has 4 groups<br>
category 8 has 4 groups<br>
<br>
well, it is 50 groups at all and not 80 (8 categories * 10 groups).<br>
<br>
this information is small table with 8 rows so it is easy to implement
and to use..<br>
<br>
well it is 50(groups in 8 categories)*50*5*6*16 = 1,200,000
combinations.<br>
without rotation it is 50*50*5*6 = 75,000 combinations.<br>
<br>
It is still too much.<br>
<br>
if you know how many symbols is in each of these 50 groups, you can
make a (convert) table<br>
group 1 has 13 symbols<br>
group 2 has 12 symbols<br>
group 3 has 21 symbols<br>
group 4 has 7 symbols<br>
...and so on...<br>
<br>
It is <br>
13+12+21+07+50+21+13+14+33+11+<br>
01+11+15+04+15+14+12+15+15+13+<br>
01+01+01+01+01+01+01+01+01+01+<br>
01+01+02+02+01+<br>
01+01+01+04+04+<br>
05+04+<br>
01+02+01+02+<br>
01+02+03+02 =<br>
361 symbols at all<br>
<br>
361*5*6*16 = 173,280 combinations if we have a convert table with 50
rows.<br>
361*5*6 = 10830 combinations without rotation!<br>
<br>
well, lets make a table of symbols with 361 rows (it is still usable)<br>
in categories 1, 3, 4, 5 and 8, there is no symbol with more than 1
variety (it is )<br>
in category 2 there is 01+11+00+00+00+00+15+00+13 with the only one
variety<br>
<br>
in category 2 there is 00+00+10+04+09+05+00+05+00 with 2 varieties
00+00+07+04+05+04+00+00 with 3 varieties, 00+00+01+00+01+00+00+00 with
4 varieties. = 55 more variations.<br>
in category 6 there is the only symbol with more than one variety and
it is 5 varieties which is 4 more variations.<br>
in category 7 there are 3 symbols with 2 varieties and 2 symbols with 3
varieties which is 5 more variations.<br>
<br>
it is 64 more variations. so it is 361 + 64 variations at all and it is
425!!!. it is only 425? didn't I make a mistake somewhere?<br>
<br>
well, 425*6*16 is 40,800 combinations (2550 without rotation)!<br>
<br>
425 rows is still not so large convert table so there could be
optimized in one more step.<br>
<br>
if I list all the SSSs I can find there is only 1884 SSS-ID's with
rotation 01. (1867 with rotation 02, ...<br>
rotation number of SSS-ID's<br>
01 1884<br>
02 1867<br>
03 1781<br>
04 1835<br>
05 1730<br>
06 1724<br>
07 1660<br>
08 1711<br>
09 1481<br>
10 1480<br>
11 1465<br>
12 1474<br>
13 1471<br>
14 1471<br>
15 1462<br>
16 1477)<br>
<br>
LET'S GO BACKWARDS<br>
Now I see, it would be better to go backwards on the SSS-ID.<br>
I have 65536 combinations in 16 bits.<br>
all 16 rotations are frequently used, so it is not economical to
optimize it.<br>
65536/16 is 4096 symbols with all variations and fills.<br>
So do Fill is often used and without optimization it is (if 6 is the
highest value)...<br>
4096/6 is 682 possible variatons.<br>
<br>
Now there is 425 variations in the IMWA (if I count right).<br>
<br>
It seems you are right, Steve, that SSS-ID numbers can not
be properly mapped onto Unicode.<br>
<br>
It can be mapped onto unicode by shortened SSS-ID xxx-x-xx which is
Variation-Fill-Rotation which is with highest values 682-6-16<br>
And we can have table with 682 variations where variation<br>
001 has value 01-01-001-01 (which is category, group, symbol and
variation in SSS-ID)<br>
002 has value 01-01-002-01<br>
.<br>
.<br>
.<br>
207 has value 02-02-011-01<br>
208 has value 02-03-001-01<br>
209 has value 02-03-001-02<br>
.<br>
.<br>
.<br>
424 has value 08-04-001-01<br>
425 has value 08-04-002-01<br>
426 is the first free value and there is 256 more free codes.<br>
<br>
682 rows is pretty small table for remapping of 65536 possible symbols
onto unicode. This table has to be made when IMWA is finished, because
of the order sequence.<br>
But if I imagine a sequence of unicode... it is just linear sequence of
IMWA symbols. It must be more complicated format which uses unicode
(because of the encoding and font compatibility) with position of the
symbol and control signs (end of sign, space, color, etc.).<br>
<br>
There is no more space to map x, y position of the symbol onto unicode
and I don't think it is not the purpose of the Unicode. Rendering of
the sign is up to render software.<br>
<br>
Here is the question if 682 variations is enough (if 425 variations is
used now)?<br>
<br>
Well, I hope, my mathematic is useful :)<br>
<br>
Tomas<br>
<br>
Steve Slevinski wrote:
<blockquote cite="mid42B91449.6080501@signpuddle.net" type="cite">
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
Hi Val,<br>
<br>
My 2 cents, SVG is the next step. It is required for quality
publishing of SignWriting documents with the IMWA. However, if done
right, it will be compatible with our current work so there is no
hurry. Unless someone else works on it first, I will get to it
eventually.<br>
<br>
For SVG we will need to convert every IMWA symbol from a static graphic
into a vector graphic. There are applications that may be able to do
the conversion automatically. However, we would need to verify every
symbol. We would then need to verify the current IMWA based signs.
Since the IMWA has around 26 thousand symbols, this could take a while.<br>
<br>
We do not need Unicode. I believe that Unicode could harm the IMWA if
done too soon.<br>
<br>
If we are concerned about document size, we need binary. Binary will
change the SSS-IDs from an 18 character string into a binary equivalent
using 1/6 the amount of data. <br>
<br>
Since this is a challenge for programmers, I'll get right down to the
bits and the SSS-ID numbers and explain why the SSS-ID numbers can not
be properly mapped onto Unicode. <br>
<br>
**** Warning, this is entirely too much information! ****<br>
<br>
What is an SSS-ID number?<br>
The SSS-ID number is a unique character string for every symbol of the
IMWA. The SSS-ID number has the format of "xx-xx-xxx-xx-xx-xx" where x
is a number from 0 to 9. The SSS-ID number has 6 parts (Catagory -
Group - Symbol - Variation - Fill - Rotation). If we look at the first
symbol of the IMWA this should make more sense.<br>
<br>
<img src="cid:part1.02080302.08000301@ruce.cz" alt=""><br>
01-01-001-01-01-01<br>
<br>
Catagory 01<br>
Group 01<br>
Symbol 001<br>
Variation 01<br>
Fill 01<br>
Rotation 01<br>
<br>
What is a bit?<br>
A bit is 1 or 0. It is the smallest value a computer can work with.
It is called an on / off switch.<br>
<br>
1 bit can represent 2 values<br>
-------------------<br>
0<br>
1<br>
<br>
2 bits can represent 4 values<br>
--------------------<br>
00<br>
01<br>
10<br>
11<br>
<br>
3 bits can represent 8 values<br>
--------------------<br>
000<br>
001<br>
010<br>
011<br>
100<br>
101<br>
110<br>
111<br>
<br>
Basic ASCII uses 7 bits. 7 bits can represent 128 values (2^7 or
2*2*2*2*2*2*2). The letter A is 65, or "0100001" in binary.<br>
<br>
Unicode was designed with 16-bits. 16 bits can represent over 65
thousand values. Originally this was thought to be enough. It was
not. Unicode was extented to have multiple layers, but each layer
still only has 16-bits. <br>
<br>
The IMWA has around 26 thousand symbols. This should be able to fit on
one layer of Unicode (layer 3 would be perfect), however the IMWA is
still growing so this is a problem for encoding. If we squeeze the
symbols too close, we won't be able to add new symbols. If we don't
squeeze them close enough, we run out of room.<br>
<br>
Let's take a specific example to help clear this up.<br>
<br>
Here is the first symbol of the IMWA again.<br>
<img src="cid:part2.06070200.08040006@ruce.cz" alt=""><br>
01-01-001-01-01-01<br>
<br>
If this symbol could be placed in Unicode, it would use 16 bits. Since
it is first in the alphabet, it would have the value of 1 or
"0000000000000001" in binary.<br>
<br>
If we store this symbol using the SSS-ID, we would use 18 characters
(01-01-001-01-01-01). Since each character uses 8 bits, we would be
using 144 bits. This is much bigger than 16 bits, but it is very clear.<br>
<br>
So we need a mapping from SSS-ID number to a specific number of bits.
Since the SSS-ID number is very regular, we can state a maximum number
of bits possible.<br>
<br>
Catagory - Group - Symbol - Variation - Fill - Rotation<br>
<br>
Every part of the SSS-ID uses 2 numbers except for the Symbol part
which uses 3 numbers. 99 is the largest value for 2 numbers which
would be covered by 7 bits (2^7 = 128). 999 is the largest value for 3
numbers which would be covered by 10 bits (2^10 = 1032). So....<br>
7 bits - 7 bits - 10 bits - 7 bits - 7 bits - 7 bits = 45 bits.<br>
<br>
If we analyze the current IMWA, we can get a max number for each
position in the SSS-ID number.<br>
<br>
Highest values in the current IMWA.<br>
Catagory - 8<br>
Group - 10<br>
Symbol - 50 <br>
Variation - 5<br>
Fill - 6<br>
Rotation - 16<br>
<br>
So a bit number optimized for the current IMWA would be...<br>
3 bits - 4 bits - 6 bits - 3 bits - 3 bits - 4 bits = 23 bits.<br>
<br>
So if we used 45 bits, we would never have a problem with new symbols
being added to the IMWA. And we could save half the space again if we
optimized the bits for the current IMWA.<br>
<br>
Unicode uses 16 bits so we would need an additional optimization to
squeeze the IMWA number system from 23 bits into 16 bits. However,
since the IMWA is still growing, we don't know where the new symbols
will show up. Since Unicode is not allowed to change once it has been
defined, any optimization could lead to potential problems. For that
reason, I think the 45 bit option would be prefered.<br>
<br>
And that's just for the symbols themselves. We still have the XY
coordinates and color for each symbol. But that's enough for now.<br>
<br>
-Steve<br>
<br>
<br>
Valerie Sutton wrote:
<blockquote
cite="mid8E16E542-4547-4384-BE73-6FEAF83F557D@signwriting.org"
type="cite">SignWriting List <br>
June 21, 2005 <br>
<br>
<blockquote type="cite">On Jun 21, 2005, at 4:40 PM, Stuart
Thiessen
wrote: <br>
A clarification on this: I completely agree that SWML is a valuable
step to making SW searchable and easily transported. However, SWML as
such does not handle the display of SW, only the storage. So computer
software that reads SWML will have to use some kind of display process
to make the SW data visual. This display process could use SVG
images, PNG images, or a Unicode font to provide the displayed images
depending on the program. So, we need to separate the roles of SWML
and display. SWML only has to do with storage and retrieval of data,
but not display. <br>
</blockquote>
<br>
I see. Thanks for explaining this to me! So when Steve is using SWML
to store data in SignPuddle, he is using PNGs to do the visual display
of what the SWML says should be displayed? I wasn't aware of this...I
am glad to know this... <br>
<br>
<blockquote type="cite">Until SW is finally in Unicode, SW is just
graphics because that is the only display mechanism we have for SW.
The value of SWML is that we are now able to search it with a variety
of programs. SW- DOS by comparison probably could have been equally as
searchable but because of its binary format, that made it much more
difficult compared to SWML. But search capability and display
capabilities are two different "animals". The value of Unicode is
simply this: hearing people will probably not fully appreciate SW
until it is available in Unicode and it is able to be composed just
like spoken languages (in a manner of speaking). This is simply
because it takes much less room to store Unicode symbols than it does
to store graphic images. The display happens either way, but I'm
talking here more about "political" respect or the perceived reality
of SW's status as a genuine writing system. <br>
</blockquote>
<br>
OK. What about SVG? I remember years ago, Antonio Carlos came to visit
me from Brazil, and was eager to explain both SWML and SVG to me...I
remember feeling amazed at the possibilities when he showed me a
SignWriting symbol being drawn on the web in front of my eyes in
SVG...Now that we see that SWML is really becoming important, I wonder
if SVG isn't next? <br>
<br>
That does not mean that I don't think Unicode is a terrific idea...it
is just that Unicode takes money and time, and if PNG display is the
only alternative right now, then maybe SVG could be another
alternative until Unicode is available for SignWriting? <br>
<br>
Did you know that the French have interest in developing a way to
apply SignWriting to Unicode? I wonder if Mr. Dalle and Mr. Aznar from
France wouldn't be interested in working with SIL on the Unicode
project? Do you think SIL could be interested?... <br>
<br>
<blockquote type="cite">Also, the use of Unicode will not make SWML
obsolete. In fact, I think that SWML will be even more useful because
instead of having special code numbers in the markup, we can actually
embed the Unicode character for that SW symbol. This will make SWML
files more compact and more easily read and further enhance its
usefulness. But that is a little more down the road until funding and
resources become available. Once funding is available, we can
certainly begin work on it and then just wait on a final submission
until we feel the IMWA is more stable. <br>
</blockquote>
<br>
I see. Very interesting, Stuart! You know so much! ;-) <br>
<br>
Thanks for your patience with me and all those symbols in the
IMWA!...I actually am not necessarily in favor of placing the whole
IMWA into Unicode. I think we should do a Symbol-Frequency test on
dictionaries to pin down the symbols that you really are using, and
then use the Language-specific symbolset to be the first SignWriting
Unicode...in other words...Unicode US, Unicode NO, etc...based on only
those SignWriting symbols used in one language...why slow down the
Unicode development for SignWriting, just because DanceWriting has
not been entered into the IMWA yet? And is there really a Unicode for
music sounds? No. So why should DanceWriting be in Unicode?...Unicode
should be for SignWriting specific to one sign language... <br>
<br>
Just a thought. I will leave Unicode development to you and the next
generation! <br>
<br>
Val ;-) <br>
<br>
<br>
</blockquote>
<br>
</blockquote>
</body>
</html>