[sw-l] challenge for programmers

Stuart Thiessen sw at PASSITONSERVICES.ORG
Wed Jun 22 20:38:09 UTC 2005

See comments below ...

On Jun 22, 2005, at 14:21, Steve Slevinski wrote:

>  Hi Stuart,
>  I'm still trying to understand this complex topic myself.  I'm glad  
> that people all over the world are working on this.  I do not have the  
> answers, only opinions.
>  Here are 4 topics that will need to be addressed if you want to use  
> Unicode with the IMWA.
>  1) A rendering engine can handle the rotations, but not the fills for  
> handshapes.  The fills are irregular.  Look at symbols 01-01-007-01  
> and 01-01-008-01.  The base symbol (fill 1 and rotation 1) is the same  
> for each.  The fills are different.  Each handshape has 16 rotations.   
> 8 for the right hand and 8 for the left hand.  So for each handshape  
> symbol, you will need to define the 6 base fills.  From these 6 bases,  
> you can rotate and flip the image to come up with the 96 images needed  
> for each handshape.

Actually, we could have a "helper" symbol that the input process  
inserts to indicate fill information, but it can work just as well to  
leave it in the font itself.

>  2) Include all 25 thousand plus symbols of the IMWA in Unicode.  The  
> IMWA is an alphabet used for sorting.  Here are 4 handshapes sorted  
> alphabetically.
> <unknown.jpg>

>  If you only included the symbol's base in Unicode and ignored the  
> rotations, you would not be able to sort.  These symbols would all  
> have the same 16-bit Unicode value.  You would need to add the  
> rotation as an additional 4-bits.  In essence you would be using  
> 20-bit to store unique handshapes, so why are you using Unicode

Like I said before, we can use "helper" characters that can provide  
this additional information for sorting and other purposes.  As long as  
the relationship is expressed in a straightforward manner and the input  
process and the renderer know how to process this, it is not a problem  
for Unicode and we retain the same information.  Just a matter of  
approaching things differently.

>  3) You need a bi-directional conversion from SSS ID to Unicode number  
> and back again.  This is not as easy as it sounds and will make  
> working with the IMWA ridiculously difficult for the lay programmer.

Well, I think Tomas has shown that it is possible.  Using Tomas'  
approach plus the idea of blank rows and possibly using a different  
range for supplemental symbols if necessary, we can make things work in  
Unicode without much problem.  Once a mapping has been derived, then it  
is simply a matter of developing libraries or other such things for  
programmers so that most of those calculations are done for them and  
they can just include the libraries in their code.  Not a problem for  
the lay programmer if we give them the tools to process it.

>  Let's consider symbol 01-06-002-01-02-01.  In Unicode this might be  
> character 13431.
> <unknown.jpg>
>  With SignWriter and SignMaker, we've learned that the special  
> commands are very important and very powerful.  If we want to change  
> the fill of the symbol, it is very easy to do with the SSS ID number.   
> Add one to the fill position and make sure the symbol exists.   
> 01-06-002-01-03-01
> <unknown.jpg>
>  Using the Unicode value of 13431, we have 2 options.  Convert to SSS  
> ID and then back again, or work directly with the Unicode value.  The  
> new value would be 13447, but that's because I know that all 16  
> rotations are being used.  The IMWA is irregular with fills and  
> rotation for non-handshape symbols so determining the correct number  
> to add to the Unicode value is not straight forward.  Since 16 bits is  
> insufficient for a simple (one-line) conversion between SSS ID and  
> unique number, the conversion would be a nightmare that would need to  
> be recreated in every program language or explicitly defined in a  
> database or conversion file(s).  The database option would be best,  
> but would require over 25 thousand entries and require all SignWriting  
> applications to use a database.  A flat file conversion would be over  
> 4 MB.  A good option might be 50,000 small files (2 files for each  
> symbol), but that would require 8 MB of disk space.
>  01-06-002-01-02-01.txt = 13431
>  13431.txt = 01-06-002-01-02-01

With summer school, I haven't the time at the moment to work through  
your argument in detail.  But I generally take the approach that there  
are few problems that are too difficult to solve. I believe if we want  
a solution, we will find one.  If we don't, of course it becomes much  
more difficult. ;)

>  4) We will always need the X,Y coordinates when using the IMWA.  A  
> while back, I discussed this with Antonio Carlos.  We don't believe  
> there is any other viable solution.  Some signs require exact symbol  
> positioning.  However, YMMV

I agree that X and Y coordinates are needed.  I disagree that it has to  
be encoded inside the Unicode character itself.  The X,Y coordinates  
can be separate elements in the linear encoding of a sign written with  
Unicode characters. That should not be a problem. But it is the  
responsibility of the renderer to resolve that issue and we simply need  
to put enough information in the linear stream for the renderer to do  
the job.

> ----------------------------------------------------------------------- 
> ------------
>  And a few last thoughts...
>  The term Unicode font is confusing and mixes 2 different ideas. 

Didn't we have the discussion before about technical terms versus lay  
terms? ;)  I was using this as a lay term, not a technical term.  :)   
However, there are fonts that are based on Unicode and there are fonts  
that are based on the traditional 255 character set.  So it was with  
that perspective that I was using the term "unicode font".

>  Unicode is nothing more than identifying a unique mental character  
> with a unique number.  Unicode number 65 is the letter A, but not the  
> A on the screen, but the idea of the letter A. 


>  A font can identify a unique number with a unique physical  
> representation.  A font file can takes the number 65 and draw a  
> picture of the letter A. 


>  Unicode can work with fonts, but you can use fonts without Unicode.   
> I believe that 45 bit SSS ID encoding is the best option if you are  
> using fonts.  OpenType or SVG Fonts look like the best choices.  But  
> these files would be huge!

We could certainly do our custom encoding of SW in a font that did not  
use Unicode.  However, we lose the benefit of Unicode which is that a  
person can use 1 font in a document and be able to express whatever  
language they want.  With this vision, a given number means a specific  
symbol in a specific writing system and that symbol only.  To create a  
custom font encoding only for SignWriting defeats that purpose for  
which Unicode was created to solve.  Before, each language had its own  
fonts and its own encoding.  That made it difficult to know which font  
to use for which program or document. If you didn't have the right font  
or encoding, then it was gibberish.  To use a custom approach certainly  
makes life easier for us, but it doesn't make life easier for those  
outside who might receive a SW document and who may not have the font  
or encoding for it.  If we do our job right with Unicode, it could be  
possible that every computer would then be fitted with a font that also  
includes SW just like today I have on my Mac fonts for just about any  
major language in the world. During my DOS and Windows 3.1 or 95 days,  
I couldn't do that. But that is now a benefit of Unicode.

>  When I tackle SVG, I will replace all of the PNG files in the IMWA  
> with SVG files.  So instead of 25,000 graphics, I will have 25,000 SVG  
> files.  I will use the current key file which is about 100k.  It is an  
> elegant solution and very accessible and has none of the short-comings  
> and complications of a Unicode implementation.

I think you are selling Unicode short.  I have nothing against an SVG  
approach.  After all, TMTOWTDI.  However, I think you are side-stepping  
the political benefits of Unicode which would help you in the very  
discussions for funding that you discussed in a previous email. If  
people knew that SignWriting could be handled like any other language  
from a Unicode font, that says something about its establishment as a  
mainstream writing system.  SVG and other approaches (while certainly  
worthwhile) keep SignWriting as a niche writing system rather than a  
mainstream writing system in the minds of the average person who uses a  
writing system on a computer.

>  ----------------------------
>  I do not think that "draw" should be a taboo word within  
> SignWriting.  My handwriting improved when I started to draw the  
> letters of the English alphabet.  I started to look at the marks I was  
> making on the page and I would aesthetically compare what I was  
> drawing with how the letters should look.  For me, drawing deals with  
> how things look.  Writing deals with what things mean.  Good  
> handwriting is drawing the alphabet while writing your thoughts.  You  
> can not draw a sentence.

The characterization of a writing system is that you write it, not draw  
it.  I know Valerie has made that point over and over to us on the  
list. And believe it or not, that as a technical term is important to  
help people know that we are not just drawing pictographs to  
communicate via writing. The image in the mind of the average person is  
that we do not draw our letters. We write our letters. We are actually  
writing our languages just like any other person on the planet writes  
their language with the symbols that they use for their writing  
systems. I prefer not to use the term "draw" because again it makes a  
distinction between SignWriting and other writing systems and makes  
SignWriting appear not to be a valid writing system. You can argue that  
we draw the characters as we write it, but that distinction is lost on  
the average person who thinks about writing as a linguistic activity  
and drawing as a non-linguistic activity.

Perhaps it is a fine distinction philosophically, but it is a  
significant one to me.

Our discussion helps me think through these issues, so thank you!


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 10586 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20050622/ece93b15/attachment.bin>

More information about the Sw-l mailing list