[sw-l] Are We Going in the Wrong Direction?

Sandy Fleming sandy at FLEIMIN.DEMON.CO.UK
Sat Dec 11 10:16:45 UTC 2004


Hi Dan!

> > But you have to distinguish the DOM from the XML (or SWML). Just because
> > something isn't stored as XML doesn't mean you can't have a DOM.
>
> I agree. It could be a simple string for data entry purposes, but the
> internal data structure could easily be a tree. Heck, internally, we could
> even have SWML, which begs the question: why a linear format at all? Why
> not simply compress SWML?

If SWML can be tamed as a practical file format then it would certainly be
the best, as long as it's kept clean and not being filled up with unncessary
features. But to compare even with files stored in character + coordinate
form we're talking about a compression ratio of over 90%. I know XML files
compress well, but can this be achieved? And I'm about to explain my "one
font character per symbol" idea which does away with having to store
coordinates in the file...!

> But I'm not entirely sure that the idea itself buys me anything more.
> Supposing we have these characte representations, such as
> O74,143;$34,122;^74,123;s89,205. The characters have to have built-in some
> information about rotation, orientation, etc. The SSS has some
> intelligence in its organization that can easily have me select, e.g., all
> open-palm handshapes. Or all handshapes with a single extended finger.
> Could a character set do as much?

I wasn't thinking of storing rotation, orientation &c. I thought there were
moves to get anything necessary from the IMWA stored in unicode fonts? In
which case it's a matter of the program inserting the character
corresponding to the user's chosen orientation, rotation &c? Even without
unicode we can have a character-like mapping to the necessary IMWA graphics
files, can't we? I am in fact beginning to wonder if a gif/png solution
might not be preferable, because the sizes of different symbols in the IMWA
vary an awful lot - eg, compare the contact symbol with the head or long
arrow. Even better would be SVG as Trevor suggested, so that the images
would be scalable. So instead of fonts we may have an IMWA (perhaps in SVG
form) subset and a mapping from 16-bit numbers to these to give us a file
storage system exactly comparable to the file storage of oral-language text,
on a character-to-symbol basis. If you opened such a SW file in an ordinary
oral-language text editor, what you might see is a lot of random characters,
each corresponding to a symbol in our SW system theough showing up as
characters from whatever oral-language font is selected. I think we'd want
to avoid using numbers that correspond to useful characters that already
exist in unicode, so that more sophisticated applications could mix SW with
oral-langauge text.

Anyway, assuming this can be done and all the necessary symbols can be
stored as a 16-bit number at most, corresponding to the sort of storage you
get for unicode files, here's a suggestion for how to store the whole of a
SW text file as a string of characters _without_ having to store dimensional
infomation.

It involves making one simple (honest!) adaptation to the existing text
files storage systems.

As you'll know, all text files store some extra information for the
operating system that's invisible to the user. All we need to do is add one
more piece of information, which is the "file width". We could try to
persuade OS writers to do this for us as part of their system but it doesn't
matter as we can implement it in our own software and still keep it hidden
from the users.

We could, for visualisation purposes, imagine the "file width" as
corresponding to the column width in the file when it's displayed, but it's
not really. The column width on display can be selected by the user, the
file width in the file is more fixed, and there is probably an ideal value,
though it could be varied by the software if necessary.

Now to save a file (write it to disk), the program goes across the column
though all the coordinates and writes a blank "character" to the file when
there's no symbol with that coordinate, and the symbol "character" when a
symbol does exist at that coordinate. Occasionally more than one symbol may
have the same coordinate position, so we could have an "overwrite" character
too to solve that.

You can see one obvious problem already - if we want very fine positioning
of symbols, we'll need a lot of spaces, corresponding to each possible pixel
at the finest useful positioning. But there are two ways to solve that - one
is to determine how fine the positioning really needs to be to store SW, and
the other is to just compress each sequence of spaces, which is always very
easy and effective for this sort of thing.

Whether this is all worthwhile does hinge on how file sizes in this format
compare with SWML. If SMWL files are going to be a many times larger even
when compressed then this file format is probably preferable for any program
that is expected to work with very large files of SW text, eg novels. If
SWML can be stored at a comparable size then SWML is probably better though
I think we should question every "bell and whistle" we think of adding to
the SWML - we would want to keep it lean.

You might ask why I think file size is so important. You can think of
people, particularly schools, who may be having to use old computers for a
long time to come. You can think of people who can only get modem
connections for downloads on the Web, or who live in countries where the
telephone system isn't very good. You can think of us being good "netizens"
and not flooding the internet with unnecessarily massive text files. It's
important.

I think I'm beginning to agree with you that this wouldn't aid searching.
But I haven't given up on the idea that you could, say, implement a sign
processor with Word VBA more easily with this sort of file storage.

Please also bear in mind that these should be designed so that they _can_ be
converted back and forth between SWML.

> BTW, I like these little exchanges of ours... I hope they lead us both to
> sharper ideas. :-)

Yes, thinking about your objections yesterday made me think of a good way of
typing SW more quickly. Must write a proof of concept first, though it takes
a long time to design a full keyboard layout...!

Even if ideas don't pan out, it's important to have considered them
properly.

Sandy



More information about the Sw-l mailing list