<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi Jonathan,<br>

    <br>

    Thanks for the comments.  FYI, I will not be changing the .spml

    files.  These files will be available as a custom export.  It will

    take several seconds to create.<br>

    <br>

    <br>

    On 10/7/11 12:03 PM, Jonathan wrote:

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      Hi Steve,<br>

          I don't remember why you want to use a string in the XML file

      for the signs.  </blockquote>

    Speed, portability, and simplicity.  I just completed my proof of

    concept for fuzzy searching.  From a 3 MB file with over 10 thousand

    signs, I can get accurate search results in less than 1 second.  I

    am using Regular Expressions to process ASCII characters.  Next

    week, I'll write about fuzzy searching, with the appropriate links

    for the proof of concept.<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite">Wouldn't

      building everything out of XML be easier to work with?  </blockquote>

    Yes and no.  Yes, because XML offers organization and portability. 

    No, because XML has a lot of overhead and gotchas.  The libraries

    take time to process text.  Not all libraries work the same or

    support the same feature set.  I think XML is too robust for simple

    text processing.<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite">Many

      libraries can parse XML back to objects or save to a database to

      do calculations and searches on.  My feeling is that XML and

      what's in it should be primarily for transporting data.  </blockquote>

    Can you show me an example of the type of XML you'd want to use for

    an individual sign?<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite">In my

      personal opinion, information that is one piece in itself

      shouldn't be concatenated with other data and then have to do

      special parsing to get a specific part of it.</blockquote>

    I can understand the logic and agree in part.  For me, sign text

    should be like regular text.  This means spaces separate words.  For

    me, each word is a piece unto itself and should be concatenated

    without spaces or punctuation because it is a unit.<br>

     <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite"> So I

      don't really like the 6 digits you are proposing below.  </blockquote>

    You can continue to use the premilinary Unicode strings if you

    prefer.  I've found that the ASCII version can be processed 4 times

    faster or more.  The ASCII regular expressions as always consistent,

    but the Unicode uses 3 different strings based on the encoding form

    of UTF-8, UTF-16, and UTF-24.<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite">But if

      we are going to have to parse it then at least make it easy to

      distinguish the parts.  It think that if you are going to keep the

      string notation then, maybe the information should be enclosed

      within an identifying symbols. Something like<br>

      <br>

      for the coordinates (41,60), (-18,-18) and  (11,-23)<br>

    </blockquote>

    Commas and parenthesis add punctuation to the string causing many

    unusual side effects and increase the possibility of a broken

    string.<br>

    <br>

    I do agree with your point.  The current coordinate notation is

    sloppy.  I've employed a simple fix.  I add 500 to each value.  This

    means coordinates will always be 7 characters long: 3 for the X

    value, 1 for the separating value, and 3 for the Y value.  <br>

    <br>

    The coordinate (41,60) becomes 541x560.  The coordinate (-18,-18)

    becomes 482x482. I was not planning to update the preliminary

    Unicode version with the new coordinate strings unless someone

    requested it.  So for the .spml files, I'm not planning any changes.<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite"> What

      about C for coodinate, then the X or Y value + 500 to get the the

      Unicode point value.  One Unicode character for X and one for Y?<br>

    </blockquote>

    Additional Unicode characters are not being considered right now

    because there is no consensus on the higher level protocols of

    SignWriting for Unicode.  Instead of the coordinate style of

    SignPuddle, they may choose a conceptual design based on deeper

    structural.<br>

    <br>

    But if the 2nd Unicode proposal did choose to go with coordinates, 1

    or 2 rows of negative values and 1 or 2 rows of positive values

    would be best.  <br>

    <br>

    As per your above preference, there is no reason to concatenate the

    X and Y values into a single character, although a single character

    for each point on a 2-dimensional grid of 256 by 256 does have a

    certain novelty.<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite"> If you

      do go with what is below, I can make it work for my program.  I

      don't have any issues with the new limited size of the axis to

      -500 to +499<br>

    </blockquote>

    I'm glad you don't mind the size limitation.  This is the biggest

    change and it is mainly a validation issue.<br>

    <br>

    <blockquote cite="mid:4E8F30DE.4090309@yahoo.ca" type="cite"> I am

      interested in your thoughts or comments on the above<br>

    </blockquote>

    Thanks for the comments.<br>

    -Steve<br>

    <br>

    <br>

    <br>

  </body>

</html>