<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Hi
SignWriting List.</p>
<p
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">I'm
happy to announce that <br>
</p>
<p>Version 2 of the <code>@sutton-signwriting/core</code> package
is now available on GitHub and npm. This update introduces two
major features:</p>
<ol>
<li><strong>SignWriting Null Symbol (S00000 / U+40000)</strong>
for enhanced sorting and advanced sequence strategies.</li>
<li><strong>Tokenizer Functions</strong> for machine learning
applications using 1180 SignWriting tokens with numerical
encoding and decoding.</li>
</ol>
<p>GitHub: <a rel="noopener"
href="https://github.com/sutton-signwriting/core"
class="moz-txt-link-freetext">https://github.com/sutton-signwriting/core</a><br>
npm: <a rel="noopener"
href="https://www.npmjs.com/package/@sutton-signwriting/core"
class="moz-txt-link-freetext">https://www.npmjs.com/package/@sutton-signwriting/core</a></p>
<hr>
<h3>Breaking Change: The Null Symbol</h3>
<p>Version 2 adds support for the SignWriting null symbol as S00000
for Formal SignWriting in ASCII (FSW) and U+40000 for SignWriting
in Unicode (SWU). This is a breaking change because signs using
the null symbol are not recognized by current tools and libraries.
Although its use is limited, the null symbol introduces a range of
possibilities for sorting and linguistic analysis.</p>
<p>The null symbol was first published in January 2022 and is
detailed in Appendix C of the Formal SignWriting draft
specification.</p>
<p>Formal SignWriting draft specification: <a rel="noopener"
href="https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C"
class="moz-txt-link-freetext">https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C</a></p>
<p>Formal SignWriting now includes four types of symbols:</p>
<ul>
<li><strong>Null Symbol</strong>: For sorting and custom
processing in sequences.</li>
<li><strong>Writing Symbols</strong>: For standard sign
representation.</li>
<li><strong>Detailed Location Symbols</strong>: For enhanced
spatial details.</li>
<li><strong>Punctuation Symbols</strong>: For text-like
structuring.</li>
</ul>
<p>A sign in Formal SignWriting is a two-part word:</p>
<ul>
<li><strong>Sequence</strong> (one-dimensional): An optional
prefix of writing symbols, detailed location symbols, and the
null symbol.</li>
<li><strong>Signbox</strong> (two-dimensional): Contains writing
symbols only; null and detailed location symbols are not
permitted here.</li>
</ul>
<p>The null symbol supports sorting strategies like placing
one-handed signs before two-handed ones. It also enables advanced
strategies by filling sequence positions (e.g., torso, arm, hand)
with the null symbol if a location is absent.</p>
<hr>
<h3>Tokenizer Functions for Machine Learning</h3>
<p>Version 2 also introduces tokenizer functions tailored for
machine learning. These use 1180 SignWriting tokens for numerical
encoding and decoding, enhancing compatibility with NLP frameworks
like Transformer-based models.</p>
<p>Inspired by Amit's SignWriting Python library, which includes
custom FSW tokenization, I recreated and extended its
functionality for JavaScript. Bipin has further ported Amit's
library to Flutter and Dart, adding visualizations and achieving
rendering speeds 3,000 times faster than <code>sutton-signwriting/font-db</code>.
<br>
</p>
<ul>
<li>Amit's Python library: <a rel="noopener"
href="https://github.com/sign-language-processing/signwriting"
class="moz-txt-link-freetext">https://github.com/sign-language-processing/signwriting</a></li>
<li>Bipin's Flutter library: <a rel="noopener"
href="https://github.com/bipinkrish/signwriting-flutter"
class="moz-txt-link-freetext">https://github.com/bipinkrish/signwriting-flutter</a></li>
<li>Bipin's Dart library: <a rel="noopener"
href="https://github.com/bipinkrish/signwriting-dart"
class="moz-txt-link-freetext">https://github.com/bipinkrish/signwriting-dart</a></li>
</ul>
<hr>
<h4>Features of the Tokenizer</h4>
<p>The tokenizer starts with <strong>DEFAULT_SPECIAL_TOKENS</strong>,
commonly used in NLP frameworks. These can be customized by
modifying index numbers, value strings, or adding new tokens.</p>
<p>Default tokens:</p>
<pre class="gmail-!overflow-visible"><div
class="gmail-contain-inline-size gmail-rounded-md gmail-border-[0.5px] gmail-border-token-border-medium gmail-relative gmail-bg-token-sidebar-surface-primary gmail-dark:bg-gray-950"><div
class="gmail-flex gmail-items-center gmail-text-token-text-secondary gmail-px-4 gmail-py-2 gmail-text-xs gmail-font-sans gmail-justify-between gmail-rounded-t-md gmail-h-9 gmail-bg-token-sidebar-surface-primary gmail-dark:bg-token-main-surface-secondary gmail-select-none">javascript</div><div
class="gmail-sticky gmail-top-9 gmail-md:top-[5.75rem]"><div
class="gmail-absolute gmail-bottom-0 gmail-right-2 gmail-flex gmail-h-9 gmail-items-center"><div
class="gmail-flex gmail-items-center gmail-rounded gmail-bg-token-sidebar-surface-primary gmail-px-2 gmail-font-sans gmail-text-xs gmail-text-token-text-secondary gmail-dark:bg-token-main-surface-secondary"><span
class="gmail-"></span></div></div></div></div></pre>
<pre class="gmail-!overflow-visible"><div
class="gmail-contain-inline-size gmail-rounded-md gmail-border-[0.5px] gmail-border-token-border-medium gmail-relative gmail-bg-token-sidebar-surface-primary gmail-dark:bg-gray-950"><div
class="gmail-overflow-y-auto gmail-p-4" dir="ltr"><code
class="gmail-!whitespace-pre gmail-hljs gmail-language-javascript"><span
class="gmail-hljs-variable gmail-constant_">DEFAULT_SPECIAL_TOKENS</span> = [
{ <span class="gmail-hljs-attr">index</span>: <span
class="gmail-hljs-number">0</span>, <span class="gmail-hljs-attr">name</span>: <span
class="gmail-hljs-string">'UNK'</span>, <span
class="gmail-hljs-attr">value</span>: <span
class="gmail-hljs-string">'[UNK]'</span> },
{ <span class="gmail-hljs-attr">index</span>: <span
class="gmail-hljs-number">1</span>, <span class="gmail-hljs-attr">name</span>: <span
class="gmail-hljs-string">'PAD'</span>, <span
class="gmail-hljs-attr">value</span>: <span
class="gmail-hljs-string">'[PAD]'</span> },
{ <span class="gmail-hljs-attr">index</span>: <span
class="gmail-hljs-number">2</span>, <span class="gmail-hljs-attr">name</span>: <span
class="gmail-hljs-string">'CLS'</span>, <span
class="gmail-hljs-attr">value</span>: <span
class="gmail-hljs-string">'[CLS]'</span> },
{ <span class="gmail-hljs-attr">index</span>: <span
class="gmail-hljs-number">3</span>, <span class="gmail-hljs-attr">name</span>: <span
class="gmail-hljs-string">'SEP'</span>, <span
class="gmail-hljs-attr">value</span>: <span
class="gmail-hljs-string">'[SEP]'</span> }
];
</code></div></div></pre>
<p>Utility functions:</p>
<ul>
<li>Tokenize FSW: <a rel="noopener"
href="https://www.sutton-signwriting.io/core/#fswtokenize"
class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswtokenize</a></li>
<li>Detokenize FSW: <a rel="noopener"
href="https://www.sutton-signwriting.io/core/#fswdetokenize"
class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswdetokenize</a></li>
<li>Chunk Tokens: <a rel="noopener"
href="https://www.sutton-signwriting.io/core/#fswchunktokens"
class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswchunktokens</a></li>
</ul>
<p>The tokenizer generator creates an object with properties for
encoding, decoding, and vocabulary management:<br>
<a rel="noopener"
href="https://www.sutton-signwriting.io/core/#fswcreatetokenizer"
class="moz-txt-link-freetext">https://www.sutton-signwriting.io/core/#fswcreatetokenizer</a></p>
<p><strong>Note</strong>: The tokenizer currently supports Formal
SignWriting in ASCII (FSW). To use it with SignWriting in Unicode
(SWU), convert to FSW first.</p>
<p>Thank you for reading!<br>
–Steve</p>
</body>
</html>