See comments below ...
On Jun 22, 2005, at 14:21, Steve Slevinski wrote:
Hi Stuart,
I'm still trying to understand this complex topic myself. I'm glad
that people all over the world are working on this. I do not have the
answers, only opinions.
Here are 4 topics that will need to be addressed if you want to use
Unicode with the IMWA.
1) A rendering engine can handle the rotations, but not the fills for
handshapes. The fills are irregular. Look at symbols 01-01-007-01
and 01-01-008-01. The base symbol (fill 1 and rotation 1) is the same
for each. The fills are different. Each handshape has 16 rotations.
8 for the right hand and 8 for the left hand. So for each handshape
symbol, you will need to define the 6 base fills. From these 6 bases,
you can rotate and flip the image to come up with the 96 images needed
for each handshape.
Actually, we could have a "helper" symbol that the input process
inserts to indicate fill information, but it can work just as well to
leave it in the font itself.
2) Include all 25 thousand plus symbols of the IMWA in
Unicode. The IMWA is an alphabet used for sorting. Here are 4
handshapes sorted alphabetically.
<
If you only included the symbol's base in Unicode and
ignored the rotations, you would not be able to sort. These symbols
would all have the same 16-bit Unicode value. You would need to add
the rotation as an additional 4-bits. In
Times New Romanessence
you would be using 20-bit to store
unique handshapes, so why are you using Unicode
Like I said before, we can use "helper" characters that can provide
this additional information for sorting and other purposes. As long
as the relationship is expressed in a straightforward manner and the
input process and the renderer know how to process this, it is not a
problem for Unicode and we retain the same information. Just a matter
of approaching things differently.
3) You need a bi-directional conversion from SSS ID to
Unicode number and back again. This is not as easy as it sounds and
will make working with the IMWA ridiculously difficult for the lay
programmer.
Well, I think Tomas has shown that it is possible. Using Tomas'
approach plus the idea of blank rows and possibly using a different
range for supplemental symbols if necessary, we can make things work
in Unicode without much problem. Once a mapping has been derived,
then it is simply a matter of developing libraries or other such
things for programmers so that most of those calculations are done for
them and they can just include the libraries in their code. Not a
problem for the lay programmer if we give them the tools to process it.
Let's consider symbol 01-06-002-01-02-01. In Unicode this
might be character 13431.
<
With SignWriter and SignMaker, we've learned that the special
commands are very important and very powerful. If we want to change
the fill of the symbol, it is very easy to do with the SSS ID number.
Add one to the fill position and make sure the symbol exists.
01-06-002-01-03-01
<
Using the Unicode value of 13431, we have 2 options. Convert to SSS
ID and then back again, or work directly with the Unicode value. The
new value would be 13447, but that's because I know that all 16
rotations are being used. The IMWA is irregular with fills and
rotation for non-handshape symbols so determining the correct number
to add to the Unicode value is not straight forward. Since 16 bits is
insufficient for a simple (one-line) conversion between SSS ID and
unique number, the conversion would be a nightmare that would need to
be recreated in every program language or
Times New Romanexplicitly
defined in a database or conversion
file(s). The database option would be best, but would require over 25
thousand entries and require all SignWriting applications to use a
database. A flat file conversion would be over 4 MB. A good option
might be 50,000 small files (2 files for each symbol), but that would
require 8 MB of disk space.
01-06-002-01-02-01.txt = 13431
13431.txt = 01-06-002-01-02-01
With summer school, I haven't the time at the moment to work through
your argument in detail. But I generally take the approach that there
are few problems that are too difficult to solve. I believe if we want
a solution, we will find one. If we don't, of course it becomes much
more difficult. ;)
4) We will always need the X,Y coordinates when using the
IMWA. A while back, I discussed this with Antonio Carlos. We don't
believe there is any other viable solution. Some signs require exact
symbol positioning. However, YMMV
I agree that X and Y coordinates are needed. I disagree that it has
to be encoded inside the Unicode character itself. The X,Y
coordinates can be separate elements in the linear encoding of a sign
written with Unicode characters. That should not be a problem. But it
is the responsibility of the renderer to resolve that issue and we
simply need to put enough information in the linear stream for the
renderer to do the job.
-----------------------------------------------------------------------------------
And a few last thoughts...
The term Unicode font is confusing and mixes 2 different ideas.
Didn't we have the discussion before about technical terms versus lay
terms? ;) I was using this as a lay term, not a technical term. :)
However, there are fonts that are based on Unicode and there are fonts
that are based on the traditional 255 character set. So it was with
that perspective that I was using the term "unicode font".
Unicode is nothing more than identifying a unique mental
character with a unique number. Unicode number 65 is the letter A,
but not the A on the screen, but the idea of the letter A.
Correct.
A font can
Times New Romanidentify a
unique number with a unique physical
representation. A font file can takes the number 65 and draw a
picture of the letter A.
Certainly.
Unicode can work with fonts, but you can use fonts without
Unicode. I believe that 45 bit SSS ID encoding is the best option if
you are using fonts. OpenType or SVG Fonts look like the best
choices. But these files would be huge!
We could certainly do our custom encoding of SW in a font that did not
use Unicode. However, we lose the benefit of Unicode which is that a
person can use 1 font in a document and be able to express whatever
language they want. With this vision, a given number means a specific
symbol in a specific writing system and that symbol only. To create a
custom font encoding only for SignWriting defeats that purpose for
which Unicode was created to solve. Before, each language had its own
fonts and its own encoding. That made it difficult to know which font
to use for which program or document. If you didn't have the right
font or encoding, then it was gibberish. To use a custom approach
certainly makes life easier for us, but it doesn't make life easier
for those outside who might receive a SW document and who may not have
the font or encoding for it. If we do our job right with Unicode, it
could be possible that every computer would then be fitted with a font
that also includes SW just like today I have on my Mac fonts for just
about any major language in the world. During my DOS and Windows 3.1
or 95 days, I couldn't do that. But that is now a benefit of Unicode.
When I tackle SVG, I will replace all of the PNG files in
the IMWA with SVG files. So instead of 25,000 graphics, I will have
25,000 SVG files. I will use the current key file which is about
100k. It is an elegant solution and very accessible and has none of
the short-comings and complications of a Unicode implementation.
I think you are selling Unicode short. I have nothing against an SVG
approach. After all, TMTOWTDI. However, I think you are
side-stepping the political benefits of Unicode which would help you
in the very discussions for funding that you discussed in a previous
email. If people knew that SignWriting could be handled like any other
language from a Unicode font, that says something about its
establishment as a mainstream writing system. SVG and other
approaches (while certainly worthwhile) keep SignWriting as a niche
writing system rather than a mainstream writing system in the minds of
the average person who uses a writing system on a computer.
----------------------------
I do not think that "draw" should be a taboo word within
SignWriting. My handwriting improved when I started to draw the
letters of the English alphabet. I started to look at the marks I was
making on the page and I would
Times New Romanaesthetically
compare what I was drawing with how the
letters should look. For me, drawing deals with how things look.
Writing deals with what things mean. Good handwriting is drawing the
alphabet while writing your thoughts. You can not draw a sentence.
The characterization of a writing system is that you write it, not
draw it. I know Valerie has made that point over and over to us on
the list. And believe it or not, that as a technical term is important
to help people know that we are not just drawing pictographs to
communicate via writing. The image in the mind of the average person
is that we do not draw our letters. We write our letters. We are
actually writing our languages just like any other person on the
planet writes their language with the symbols that they use for their
writing systems. I prefer not to use the term "draw" because again it
makes a distinction between SignWriting and other writing systems and
makes SignWriting appear not to be a valid writing system. You can
argue that we draw the characters as we write it, but that distinction
is lost on the average person who thinks about writing as a linguistic
activity and drawing as a non-linguistic activity.
Perhaps it is a fine distinction philosophically, but it is a
significant one to me.
Our discussion helps me think through these issues, so thank you!
Thanks,
Stuart