<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">Yes, I am very excited about this. I will need to start implementing it rather than using my work arounds. :-)<div><br></div><div>THANKS YOU!!!!</div><div><br id="lineBreakAtBeginningOfMessage"><div>
<span><img alt="namesign.png" src="cid:C1393E06-7600-4E54-95BA-83D77F0817F1"></span><br class="Apple-interchange-newline"><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; display: inline !important; float: none;">Adam</span>
</div>
<div><br><blockquote type="cite"><div>On Nov 30, 2024, at 3:52 PM, Valerie Sutton <sutton@signwriting.org> wrote:</div><br class="Apple-interchange-newline"><div><div>SignWriting List<br>November 30, 2024<br><br>Hello SW List members,<br><br>A big THANK YOU, to you Steve, for these new software developments regarding sorting dictionaries with “null”, which I remember years ago was important to sort by SignSpellings properly. I remember Adam needed this for his dictionary work...<br><br>And I am just learning about “tokenizers” - all in all thank you - for all you do to give SignWriting developers needed tools -<br><br>I am enjoying watching all this unfold - thank you to all of you for your new Github libraries - so many new developments...<br><br>Val ;-)<br><br><br>Valerie Sutton<br>sutton@signwriting.org<br><br>---------------<br><br><blockquote type="cite">On Nov 30, 2024, at 2:19 PM, Steve Slevinski <slevin@signpuddle.net> wrote:<br><br>Hi SignWriting List.<br>I'm happy to announce that <br>Version 2 of the @sutton-signwriting/core package is now available on GitHub and npm. This update introduces two major features:<br> • SignWriting Null Symbol (S00000 / U+40000) for enhanced sorting and advanced sequence strategies.<br> • Tokenizer Functions for machine learning applications using 1180 SignWriting tokens with numerical encoding and decoding.<br>GitHub: https://github.com/sutton-signwriting/core<br>npm: https://www.npmjs.com/package/@sutton-signwriting/core<br>Breaking Change: The Null Symbol<br>Version 2 adds support for the SignWriting null symbol as S00000 for Formal SignWriting in ASCII (FSW) and U+40000 for SignWriting in Unicode (SWU). This is a breaking change because signs using the null symbol are not recognized by current tools and libraries. Although its use is limited, the null symbol introduces a range of possibilities for sorting and linguistic analysis.<br>The null symbol was first published in January 2022 and is detailed in Appendix C of the Formal SignWriting draft specification.<br>Formal SignWriting draft specification: https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C<br>Formal SignWriting now includes four types of symbols:<br> • Null Symbol: For sorting and custom processing in sequences.<br> • Writing Symbols: For standard sign representation.<br> • Detailed Location Symbols: For enhanced spatial details.<br> • Punctuation Symbols: For text-like structuring.<br>A sign in Formal SignWriting is a two-part word:<br> • Sequence (one-dimensional): An optional prefix of writing symbols, detailed location symbols, and the null symbol.<br> • Signbox (two-dimensional): Contains writing symbols only; null and detailed location symbols are not permitted here.<br>The null symbol supports sorting strategies like placing one-handed signs before two-handed ones. It also enables advanced strategies by filling sequence positions (e.g., torso, arm, hand) with the null symbol if a location is absent.<br>Tokenizer Functions for Machine Learning<br>Version 2 also introduces tokenizer functions tailored for machine learning. These use 1180 SignWriting tokens for numerical encoding and decoding, enhancing compatibility with NLP frameworks like Transformer-based models.<br>Inspired by Amit's SignWriting Python library, which includes custom FSW tokenization, I recreated and extended its functionality for JavaScript. Bipin has further ported Amit's library to Flutter and Dart, adding visualizations and achieving rendering speeds 3,000 times faster than sutton-signwriting/font-db.<br> • Amit's Python library: https://github.com/sign-language-processing/signwriting<br> • Bipin's Flutter library: https://github.com/bipinkrish/signwriting-flutter<br> • Bipin's Dart library: https://github.com/bipinkrish/signwriting-dart<br>Features of the Tokenizer<br>The tokenizer starts with DEFAULT_SPECIAL_TOKENS, commonly used in NLP frameworks. These can be customized by modifying index numbers, value strings, or adding new tokens.<br>Default tokens:<br>javascript<br><br>DEFAULT_SPECIAL_TOKENS = [<br>{ index: 0, name: 'UNK', value: '[UNK]' },<br>{ index: 1, name: 'PAD', value: '[PAD]' },<br>{ index: 2, name: 'CLS', value: '[CLS]' },<br>{ index: 3, name: 'SEP', value: '[SEP]' }<br>];<br><br>Utility functions:<br> • Tokenize FSW: https://www.sutton-signwriting.io/core/#fswtokenize<br> • Detokenize FSW: https://www.sutton-signwriting.io/core/#fswdetokenize<br> • Chunk Tokens: https://www.sutton-signwriting.io/core/#fswchunktokens<br>The tokenizer generator creates an object with properties for encoding, decoding, and vocabulary management:<br>https://www.sutton-signwriting.io/core/#fswcreatetokenizer<br>Note: The tokenizer currently supports Formal SignWriting in ASCII (FSW). To use it with SignWriting in Unicode (SWU), convert to FSW first.<br>Thank you for reading!<br>–Steve<br>_______________________________________________<br>Sw-l mailing list<br>Sw-l@listserv.linguistlist.org<br>https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/sw-l<br></blockquote><br>_______________________________________________<br>Sw-l mailing list<br>Sw-l@listserv.linguistlist.org<br>https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/sw-l<br></div></div></blockquote></div><br></div></body></html>