Signbox size and coordinate strings
Jonathan
duncanjonathan at YAHOO.CA
Mon Oct 10 15:10:41 UTC 2011
On 08/10/2011 1:45 PM, Steve Slevinski wrote:
> Hi Jonathan,
>
> Thanks for the comments. FYI, I will not be changing the .spml
> files. These files will be available as a custom export. It will
> take several seconds to create.
OK that's good to know.
> On 10/7/11 12:03 PM, Jonathan wrote:
>> Hi Steve,
>> I don't remember why you want to use a string in the XML file for
>> the signs.
> Speed, portability, and simplicity. I just completed my proof of
> concept for fuzzy searching. From a 3 MB file with over 10 thousand
> signs, I can get accurate search results in less than 1 second. I am
> using Regular Expressions to process ASCII characters. Next week,
> I'll write about fuzzy searching, with the appropriate links for the
> proof of concept.
>
>> Wouldn't building everything out of XML be easier to work with?
> Yes and no. Yes, because XML offers organization and portability.
> No, because XML has a lot of overhead and gotchas. The libraries take
> time to process text. Not all libraries work the same or support the
> same feature set. I think XML is too robust for simple text processing.
>
>> Many libraries can parse XML back to objects or save to a database to
>> do calculations and searches on. My feeling is that XML and what's
>> in it should be primarily for transporting data.
> Can you show me an example of the type of XML you'd want to use for an
> individual sign?
I was thinking of something a little like
<entry id="3" cdt="1172438877" mdt="1218173289" usr="admin">
<sign align="left" maxx="23" maxy="37">
<symbol x="1" y="7">???</symbol>
<symbol x="-22" y="7">???</symbol>
<symbol x="-2" y="-38">???</symbol>
<sequence>
<seqsymbol pos="1">???</seqsymbol>
<seqsymbol pos="2">???</seqsymbol>
<seqsymbol pos="3">???</seqsymbol>
</sequence>
</sign>
<term>DELAY</term>
<text>Delay, postpone, move forward in time</text>
<src>So and So</src>
</entry>
This way the only thing that has to have special code to parse is the 3
character Unicode string. I would have to look into it a little deeper
for agreeing on a final XML. I think it would be easier for programmers
to use being that there is less parsing to do and can use regular XML
parsing tools to get at the information.
>
>> In my personal opinion, information that is one piece in itself
>> shouldn't be concatenated with other data and then have to do special
>> parsing to get a specific part of it.
> I can understand the logic and agree in part. For me, sign text
> should be like regular text. This means spaces separate words. For
> me, each word is a piece unto itself and should be concatenated
> without spaces or punctuation because it is a unit.
>
>> So I don't really like the 6 digits you are proposing below.
> You can continue to use the premilinary Unicode strings if you
> prefer. I've found that the ASCII version can be processed 4 times
> faster or more. The ASCII regular expressions as always consistent,
> but the Unicode uses 3 different strings based on the encoding form of
> UTF-8, UTF-16, and UTF-24.
>
>> But if we are going to have to parse it then at least make it easy to
>> distinguish the parts. It think that if you are going to keep the
>> string notation then, maybe the information should be enclosed within
>> an identifying symbols. Something like
>>
>> for the coordinates (41,60), (-18,-18) and (11,-23)
> Commas and parenthesis add punctuation to the string causing many
> unusual side effects and increase the possibility of a broken string.
>
> I do agree with your point. The current coordinate notation is
> sloppy. I've employed a simple fix. I add 500 to each value. This
> means coordinates will always be 7 characters long: 3 for the X value,
> 1 for the separating value, and 3 for the Y value.
>
> The coordinate (41,60) becomes 541x560. The coordinate (-18,-18)
> becomes 482x482. I was not planning to update the preliminary Unicode
> version with the new coordinate strings unless someone requested it.
> So for the .spml files, I'm not planning any changes.
Yes I like it much better with the x in the middle.
>
>> What about C for coodinate, then the X or Y value + 500 to get the
>> the Unicode point value. One Unicode character for X and one for Y?
> Additional Unicode characters are not being considered right now
> because there is no consensus on the higher level protocols of
> SignWriting for Unicode. Instead of the coordinate style of
> SignPuddle, they may choose a conceptual design based on deeper
> structural.
>
> But if the 2nd Unicode proposal did choose to go with coordinates, 1
> or 2 rows of negative values and 1 or 2 rows of positive values would
> be best.
>
> As per your above preference, there is no reason to concatenate the X
> and Y values into a single character, although a single character for
> each point on a 2-dimensional grid of 256 by 256 does have a certain
> novelty.
I didn't mean both the X and the Y saved within one character, rather,
one character each.
>
>> If you do go with what is below, I can make it work for my program.
>> I don't have any issues with the new limited size of the axis to -500
>> to +499
> I'm glad you don't mind the size limitation. This is the biggest
> change and it is mainly a validation issue.
>
>> I am interested in your thoughts or comments on the above
> Thanks for the comments.
Thanks for yours too!!
> -Steve
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.914 / Virus Database: 271.1.1/3943 - Release Date: 10/07/11 00:34:00
>
--
* *
* _ ____ *
* /\ | | (| \ *
*| | __ _ _ __, _|_ | | __, _ _ | | _ _ __ __, _ _ *
*| | / \_/ |/ | / | | |/ \ / | / |/ | _| || | / |/ | / / | / |/ | *
* \_|/\__/ | |_/\_/|_/|_/| |_/\_/|_/ | |_/ (/\___/ \_/|_/ | |_/\___/\_/|_/ | |_/*
* /| *
* \| *
email: duncanjonathan at yahoo.ca <mailto:duncanjonathan at yahoo.ca>
joyoduncan at gmail.com <mailto:joyoduncan at gmail.com>
Cel: 9983-1204
Tel: 2213-5285
Skype: yojoduncan
SignWriter Studio <http://www.signwriterstudio.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20111010/fd9eef00/attachment.htm>
More information about the Sw-l
mailing list