Explaining the sign data format of FSW

Steve Slevinski slevin at SIGNPUDDLE.NET
Fri Jun 29 14:08:05 UTC 2012


Hi Honza,

FSW is a light markup that stands for Formal SignWriting which is used 
as the regular searching form.  It is documented in section 9 of the 
Modern SignWriting specifications. PDF link: 
https://github.com/Slevinski/msw/raw/master/MSW.pdf

FSW is my preferred storage form because it is possible to quickly 
search for exact or approximate matches.  It contains the same 
information as the XML.  The string is predictable and easy to process 
with regular expressions or even basic string functions.

The first difference is the use of symbol keys rather than symbol IDs.  
The conversion between these 2 standards can be accomplished with the 
ISWA 2010 database or one of 2 text files.

The verbose text file is just under 1 MB.  It contains an explicit list 
of symbol IDs to symbol keys, such as
http://signpuddle.net/iswa/data/iswa_id_key.txt

01-01-001-01-01-01,10000
01-01-001-01-01-02,10001
01-01-001-01-01-03,10002
01-01-001-01-01-04,10003
01-01-001-01-01-05,10004
01-01-001-01-01-06,10005


The minimalistic text file is 11 KB. It contains an explicit list for 
the symbol prefix (BaseSymbol part), but not the entire ID or key.  The 
conversion of the fills and rotations is trivial and handled by the code.
http://signpuddle.net/iswa/data/iswa_sym_base.txt

01-01-001-01,100
01-01-002-01,101
01-01-003-01,102
01-01-004-01,103
01-01-005-01,104
01-01-006-01,105


Let's consider the most basic example.



*XML with symbol ID*
<signbox max_x="8" max_y="15">
   <sym left="-7" top="-15">01-01-001-01-01-01</sym>
</signbox>

*XML with symbol key*
<signbox max_x="8" max_y="15">
   <sym left="-7" top="-15">S10000</sym>
</signbox>

*FSW*
M508x515S10000493x485

The first thing to notice is that the coordinates of FSW are offset by 
500 for each coordinate number.

The XML segment: (max_x="8" max_y="15") is equal to the FSW segment 
"508x515".

Each symbol has it's own coordinate placement.  So we can break up the 
FSW string as follows:
M 508x515 S10000 493x485

The M stands for the middle lane.  The other options are L for left, R 
for right, and B for signboxes used in horizontal writing.

For signs with that have a signspelling sequence, the signbox 
information described above is preceded by an A section as a list of 
symbol keys without coordinate information.

Here is a longer example:
AS10000S10e00S11e00M547x516S10000454x486S10e00489x486S11e00524x485

It has 2 sections.  The first section is the signspelling sequence: A 
S10000 S10e00 S11e00

The second section is the signbox construction: M 547x516 S10000 454x486 
S10e00 489x486 S11e00 524x485

Which can be understood as:
M 547x516 = middle lane with max coordinate of (47,16)
S10000 454x486 = symbol ID 01-01-001-01-01-01 at coordinate (-46,-14)
S10e00 489x486 = symbol ID  01-02-001-01-01-01 at coordinate (-11,-14)
S11e00 524x485 = symbol ID  01-03-001-01-01-01 at coordinate (24,-15)

If you've made it this far and you absolutely don't want to use FSW for 
the custom export, I should be able to add an XML sign data option for 
the custom export only, but the XML sign data option will not be 
available for the entire puddle exports.

Hope that helps,
-Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20120629/a1ab2f26/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bdaidghf.png
Type: image/png
Size: 575 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20120629/a1ab2f26/attachment.png>


More information about the Sw-l mailing list