[Sw-l] Announcing version 2 of the Sutton SignWriting Core package for JavaScript

Adam Frost icemandeaf at gmail.com
Sun Dec 1 00:22:46 UTC 2024


Yes, I am very excited about this. I will need to start implementing it rather than using my work arounds. :-)

THANKS YOU!!!!


Adam

> On Nov 30, 2024, at 3:52 PM, Valerie Sutton <sutton at signwriting.org> wrote:
> 
> SignWriting List
> November 30, 2024
> 
> Hello SW List members,
> 
> A big THANK YOU, to you Steve, for these new software developments regarding sorting dictionaries with “null”, which I remember years ago was important to sort by SignSpellings properly. I remember Adam needed this for his dictionary work...
> 
> And I am just learning about “tokenizers” - all in all thank you - for all you do to give SignWriting developers needed tools -
> 
> I am enjoying watching all this unfold - thank you to all of you for your new Github libraries - so many new developments...
> 
> Val ;-)
> 
> 
> Valerie Sutton
> sutton at signwriting.org
> 
> ---------------
> 
>> On Nov 30, 2024, at 2:19 PM, Steve Slevinski <slevin at signpuddle.net> wrote:
>> 
>> Hi SignWriting List.
>> I'm happy to announce that 
>> Version 2 of the @sutton-signwriting/core package is now available on GitHub and npm. This update introduces two major features:
>>    • SignWriting Null Symbol (S00000 / U+40000) for enhanced sorting and advanced sequence strategies.
>>    • Tokenizer Functions for machine learning applications using 1180 SignWriting tokens with numerical encoding and decoding.
>> GitHub: https://github.com/sutton-signwriting/core
>> npm: https://www.npmjs.com/package/@sutton-signwriting/core
>> Breaking Change: The Null Symbol
>> Version 2 adds support for the SignWriting null symbol as S00000 for Formal SignWriting in ASCII (FSW) and U+40000 for SignWriting in Unicode (SWU). This is a breaking change because signs using the null symbol are not recognized by current tools and libraries. Although its use is limited, the null symbol introduces a range of possibilities for sorting and linguistic analysis.
>> The null symbol was first published in January 2022 and is detailed in Appendix C of the Formal SignWriting draft specification.
>> Formal SignWriting draft specification: https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C
>> Formal SignWriting now includes four types of symbols:
>>    • Null Symbol: For sorting and custom processing in sequences.
>>    • Writing Symbols: For standard sign representation.
>>    • Detailed Location Symbols: For enhanced spatial details.
>>    • Punctuation Symbols: For text-like structuring.
>> A sign in Formal SignWriting is a two-part word:
>>    • Sequence (one-dimensional): An optional prefix of writing symbols, detailed location symbols, and the null symbol.
>>    • Signbox (two-dimensional): Contains writing symbols only; null and detailed location symbols are not permitted here.
>> The null symbol supports sorting strategies like placing one-handed signs before two-handed ones. It also enables advanced strategies by filling sequence positions (e.g., torso, arm, hand) with the null symbol if a location is absent.
>> Tokenizer Functions for Machine Learning
>> Version 2 also introduces tokenizer functions tailored for machine learning. These use 1180 SignWriting tokens for numerical encoding and decoding, enhancing compatibility with NLP frameworks like Transformer-based models.
>> Inspired by Amit's SignWriting Python library, which includes custom FSW tokenization, I recreated and extended its functionality for JavaScript. Bipin has further ported Amit's library to Flutter and Dart, adding visualizations and achieving rendering speeds 3,000 times faster than sutton-signwriting/font-db.
>>    • Amit's Python library: https://github.com/sign-language-processing/signwriting
>>    • Bipin's Flutter library: https://github.com/bipinkrish/signwriting-flutter
>>    • Bipin's Dart library: https://github.com/bipinkrish/signwriting-dart
>> Features of the Tokenizer
>> The tokenizer starts with DEFAULT_SPECIAL_TOKENS, commonly used in NLP frameworks. These can be customized by modifying index numbers, value strings, or adding new tokens.
>> Default tokens:
>> javascript
>> 
>> DEFAULT_SPECIAL_TOKENS = [
>> { index: 0, name: 'UNK', value: '[UNK]' },
>> { index: 1, name: 'PAD', value: '[PAD]' },
>> { index: 2, name: 'CLS', value: '[CLS]' },
>> { index: 3, name: 'SEP', value: '[SEP]' }
>> ];
>> 
>> Utility functions:
>>    • Tokenize FSW: https://www.sutton-signwriting.io/core/#fswtokenize
>>    • Detokenize FSW: https://www.sutton-signwriting.io/core/#fswdetokenize
>>    • Chunk Tokens: https://www.sutton-signwriting.io/core/#fswchunktokens
>> The tokenizer generator creates an object with properties for encoding, decoding, and vocabulary management:
>> https://www.sutton-signwriting.io/core/#fswcreatetokenizer
>> Note: The tokenizer currently supports Formal SignWriting in ASCII (FSW). To use it with SignWriting in Unicode (SWU), convert to FSW first.
>> Thank you for reading!
>> –Steve
>> _______________________________________________
>> Sw-l mailing list
>> Sw-l at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/sw-l
> 
> _______________________________________________
> Sw-l mailing list
> Sw-l at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/sw-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20241130/24c7c370/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: namesign.png
Type: image/png
Size: 440 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20241130/24c7c370/attachment-0003.png>


More information about the Sw-l mailing list