ISWA 2010 data change for proposed Unicode string
Steve Slevinski
slevin at SIGNPUDDLE.NET
Fri Apr 29 17:29:38 UTC 2011
Hi Jonathan and list,
I am making a small change that will only affect programmers and back
end data.
We are almost off the bleeding edge. The Unicode proposal requires a
change to the SignPuddle data. After this change, I do not plan any
additional changes. A future and final conversion may be needed for a
Unicode compromise agreement. No changes are planned for the ISWA 2010
itself.
I will be updating my documents, code libraries, and test data over the
next few days.
The primary change moves the fill and rotation codepoints 14 ahead into
different code chart rows. This leaves 14 spaces for new root symbols
to be added in future proposals. Fill codepoints will start at U+1DA9A
and Rotation codepoints will start at U+IDAA0. If a Unicode string for
a symbol is 3 codepoints long, the 1st character remains the same, but
the 2nd and 3rd will change. Each will advance 14 codepoints.
Michael Everson made this change in the Unicode proposal. It's a good
change, so I'm including it in the SignPuddle online data conversion.
He is writing a new draft that affects the Unicode world but not the
SignWriting world. All hand root symbols will appear using the first
(empty) palm facing for Unicode code charts. The new draft isn't ready
yet, but Michael's previous draft is online.
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4015.pdf
A secondary change in the proposal is regarding character count, but
will not affect the proposed symbol strings that I use. Instead of
proposing 674 new codepoints, we will be proposing 672. This compromise
will leave holes in the code charts for fill-1 and rotation-1. Unicode
strings for symbols will assume fill-1 if a symbol string does not
include a fill characters, and assume rotation-1 if a symbol string does
not include a rotation character. A proposed symbol string will be 1,
2, or 3 characters long. If approved by the Unicode committees, we will
achieve 99.7% of the goal and take a huge step forward in standardization.
I will not be removing fill-1 and rotation-1 from the test data. I
consider removal of the fill-1 and rotation-1 as Unicode normalization.
An easy process can search for and deletes these 2 characters wherever
they exist. The undo process is more complicated.
The removal of the fill-1 character breaks sorting and complicates
searching. The easy way to fix sorting is to use the fill-1 character
rather than an empty slot. This solution works for any environment,
such as mobile, desktop, web browser, and server.
If the first proposal is successful, I plan to champion a second
proposal to add Fill-1 and Rotation-1 as control characters that
complete the set. These characters are useful for programmers. Fill-1
and Rotation-1 characters facilitate easier, reusable generic code.
They eliminate the need to repeatedly test for and handle exceptions.
Regards,
-Steve
More information about the Sw-l
mailing list