<!DOCTYPE html>
<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Hi
      SignWriting List.</p>
    <p
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">I'm
      happy to announce that <br>
    </p>
    <p>Version 2 of the <code>@sutton-signwriting/core</code> package
      is now available on GitHub and npm. This update introduces two
      major features:</p>
    <ol>
      <li><strong>SignWriting Null Symbol (S00000 / U+40000)</strong>
        for enhanced sorting and advanced sequence strategies.</li>
      <li><strong>Tokenizer Functions</strong> for machine learning
        applications using 1180 SignWriting tokens with numerical
        encoding and decoding.</li>
    </ol>
    <p>GitHub: <a rel="noopener"
        href="https://github.com/sutton-signwriting/core"
        class="moz-txt-link-freetext">https://github.com/sutton-signwriting/core</a><br>
      npm: <a rel="noopener"
        href="https://www.npmjs.com/package/@sutton-signwriting/core"
        class="moz-txt-link-freetext">https://www.npmjs.com/package/@sutton-signwriting/core</a></p>
    <hr>
    <h3>Breaking Change: The Null Symbol</h3>
    <p>Version 2 adds support for the SignWriting null symbol as S00000
      for Formal SignWriting in ASCII (FSW) and U+40000 for SignWriting
      in Unicode (SWU). This is a breaking change because signs using
      the null symbol are not recognized by current tools and libraries.
      Although its use is limited, the null symbol introduces a range of
      possibilities for sorting and linguistic analysis.</p>
    <p>The null symbol was first published in January 2022 and is
      detailed in Appendix C of the Formal SignWriting draft
      specification.</p>
    <p>Formal SignWriting draft specification: <a rel="noopener"
href="https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C"
        class="moz-txt-link-freetext">https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C</a></p>
    <p>Formal SignWriting now includes four types of symbols:</p>
    <ul>
      <li><strong>Null Symbol</strong>: For sorting and custom
        processing in sequences.</li>
      <li><strong>Writing Symbols</strong>: For standard sign
        representation.</li>
      <li><strong>Detailed Location Symbols</strong>: For enhanced
        spatial details.</li>
      <li><strong>Punctuation Symbols</strong>: For text-like
        structuring.</li>
    </ul>
    <p>A sign in Formal SignWriting is a two-part word:</p>
    <ul>
      <li><strong>Sequence</strong> (one-dimensional): An optional
        prefix of writing symbols, detailed location symbols, and the
        null symbol.</li>
      <li><strong>Signbox</strong> (two-dimensional): Contains writing
        symbols only; null and detailed location symbols are not
        permitted here.</li>
    </ul>
    <p>The null symbol supports sorting strategies like placing
      one-handed signs before two-handed ones. It also enables advanced
      strategies by filling sequence positions (e.g., torso, arm, hand)
      with the null symbol if a location is absent.</p>
    <hr>
    <h3>Tokenizer Functions for Machine Learning</h3>
    <p>Version 2 also introduces tokenizer functions tailored for
      machine learning. These use 1180 SignWriting tokens for numerical
      encoding and decoding, enhancing compatibility with NLP frameworks
      like Transformer-based models.</p>
    <p>Inspired by Amit's SignWriting Python library, which includes
      custom FSW tokenization, I recreated and extended its
      functionality for JavaScript. Bipin has further ported Amit's
      library to Flutter and Dart, adding visualizations and achieving
      rendering speeds 3,000 times faster than <code>sutton-signwriting/font-db</code>.
      <br>
    </p>
    <ul>
      <li>Amit's Python library: <a rel="noopener"
          href="https://github.com/sign-language-processing/signwriting"
          class="moz-txt-link-freetext">https://github.com/sign-language-processing/signwriting</a></li>
      <li>Bipin's Flutter library: <a rel="noopener"
          href="https://github.com/bipinkrish/signwriting-flutter"
          class="moz-txt-link-freetext">https://github.com/bipinkrish/signwriting-flutter</a></li>
      <li>Bipin's Dart library: <a rel="noopener"
          href="https://github.com/bipinkrish/signwriting-dart"
          class="moz-txt-link-freetext">https://github.com/bipinkrish/signwriting-dart</a></li>
    </ul>
    <hr>
    <h4>Features of the Tokenizer</h4>
    <p>The tokenizer starts with <strong>DEFAULT_SPECIAL_TOKENS</strong>,
      commonly used in NLP frameworks. These can be customized by
      modifying index numbers, value strings, or adding new tokens.</p>
    <p>Default tokens:</p>
    <pre class="gmail-!overflow-visible"><div
class="gmail-contain-inline-size gmail-rounded-md gmail-border-[0.5px] gmail-border-token-border-medium gmail-relative gmail-bg-token-sidebar-surface-primary gmail-dark:bg-gray-950"><div
class="gmail-flex gmail-items-center gmail-text-token-text-secondary gmail-px-4 gmail-py-2 gmail-text-xs gmail-font-sans gmail-justify-between gmail-rounded-t-md gmail-h-9 gmail-bg-token-sidebar-surface-primary gmail-dark:bg-token-main-surface-secondary gmail-select-none">javascript</div><div
    class="gmail-sticky gmail-top-9 gmail-md:top-[5.75rem]"><div
class="gmail-absolute gmail-bottom-0 gmail-right-2 gmail-flex gmail-h-9 gmail-items-center"><div
class="gmail-flex gmail-items-center gmail-rounded gmail-bg-token-sidebar-surface-primary gmail-px-2 gmail-font-sans gmail-text-xs gmail-text-token-text-secondary gmail-dark:bg-token-main-surface-secondary"><span
    class="gmail-"></span></div></div></div></div></pre>
    <pre class="gmail-!overflow-visible"><div
class="gmail-contain-inline-size gmail-rounded-md gmail-border-[0.5px] gmail-border-token-border-medium gmail-relative gmail-bg-token-sidebar-surface-primary gmail-dark:bg-gray-950"><div
    class="gmail-overflow-y-auto gmail-p-4" dir="ltr"><code
    class="gmail-!whitespace-pre gmail-hljs gmail-language-javascript"><span
    class="gmail-hljs-variable gmail-constant_">DEFAULT_SPECIAL_TOKENS</span> = [
  { <span class="gmail-hljs-attr">index</span>: <span
    class="gmail-hljs-number">0</span>, <span class="gmail-hljs-attr">name</span>: <span
    class="gmail-hljs-string">'UNK'</span>, <span
    class="gmail-hljs-attr">value</span>: <span
    class="gmail-hljs-string">'[UNK]'</span> },
  { <span class="gmail-hljs-attr">index</span>: <span
    class="gmail-hljs-number">1</span>, <span class="gmail-hljs-attr">name</span>: <span
    class="gmail-hljs-string">'PAD'</span>, <span
    class="gmail-hljs-attr">value</span>: <span
    class="gmail-hljs-string">'[PAD]'</span> },
  { <span class="gmail-hljs-attr">index</span>: <span
    class="gmail-hljs-number">2</span>, <span class="gmail-hljs-attr">name</span>: <span
    class="gmail-hljs-string">'CLS'</span>, <span
    class="gmail-hljs-attr">value</span>: <span
    class="gmail-hljs-string">'[CLS]'</span> },
  { <span class="gmail-hljs-attr">index</span>: <span
    class="gmail-hljs-number">3</span>, <span class="gmail-hljs-attr">name</span>: <span
    class="gmail-hljs-string">'SEP'</span>, <span
    class="gmail-hljs-attr">value</span>: <span
    class="gmail-hljs-string">'[SEP]'</span> }
];
</code></div></div></pre>
    <p>Utility functions:</p>
    <ul>
      <li>Tokenize FSW: <a rel="noopener"
          href="https://www.sutton-signwriting.io/core/#fswtokenize"
          class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswtokenize</a></li>
      <li>Detokenize FSW: <a rel="noopener"
          href="https://www.sutton-signwriting.io/core/#fswdetokenize"
          class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswdetokenize</a></li>
      <li>Chunk Tokens: <a rel="noopener"
          href="https://www.sutton-signwriting.io/core/#fswchunktokens"
          class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswchunktokens</a></li>
    </ul>
    <p>The tokenizer generator creates an object with properties for
      encoding, decoding, and vocabulary management:<br>
      <a rel="noopener"
href="https://www.sutton-signwriting.io/core/#fswcreatetokenizer"
        class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswcreatetokenizer</a></p>
    <p><strong>Note</strong>: The tokenizer currently supports Formal
      SignWriting in ASCII (FSW). To use it with SignWriting in Unicode
      (SWU), convert to FSW first.</p>
    <p>Thank you for reading!<br>
      â€“Steve</p>
  </body>
</html>