[Sw-l] Announcing version 2 of the Sutton SignWriting Core package for JavaScript
Steve Slevinski
slevin at signpuddle.net
Sat Nov 30 22:19:55 UTC 2024
Hi SignWriting List.
I'm happy to announce that
Version 2 of the |@sutton-signwriting/core| package is now available on
GitHub and npm. This update introduces two major features:
1. *SignWriting Null Symbol (S00000 / U+40000)* for enhanced sorting
and advanced sequence strategies.
2. *Tokenizer Functions* for machine learning applications using 1180
SignWriting tokens with numerical encoding and decoding.
GitHub: https://github.com/sutton-signwriting/core
npm: https://www.npmjs.com/package/@sutton-signwriting/core
------------------------------------------------------------------------
Breaking Change: The Null Symbol
Version 2 adds support for the SignWriting null symbol as S00000 for
Formal SignWriting in ASCII (FSW) and U+40000 for SignWriting in Unicode
(SWU). This is a breaking change because signs using the null symbol are
not recognized by current tools and libraries. Although its use is
limited, the null symbol introduces a range of possibilities for sorting
and linguistic analysis.
The null symbol was first published in January 2022 and is detailed in
Appendix C of the Formal SignWriting draft specification.
Formal SignWriting draft specification:
https://www.ietf.org/archive/id/draft-slevinski-formal-signwriting-09.html#appendix-C
Formal SignWriting now includes four types of symbols:
* *Null Symbol*: For sorting and custom processing in sequences.
* *Writing Symbols*: For standard sign representation.
* *Detailed Location Symbols*: For enhanced spatial details.
* *Punctuation Symbols*: For text-like structuring.
A sign in Formal SignWriting is a two-part word:
* *Sequence* (one-dimensional): An optional prefix of writing symbols,
detailed location symbols, and the null symbol.
* *Signbox* (two-dimensional): Contains writing symbols only; null and
detailed location symbols are not permitted here.
The null symbol supports sorting strategies like placing one-handed
signs before two-handed ones. It also enables advanced strategies by
filling sequence positions (e.g., torso, arm, hand) with the null symbol
if a location is absent.
------------------------------------------------------------------------
Tokenizer Functions for Machine Learning
Version 2 also introduces tokenizer functions tailored for machine
learning. These use 1180 SignWriting tokens for numerical encoding and
decoding, enhancing compatibility with NLP frameworks like
Transformer-based models.
Inspired by Amit's SignWriting Python library, which includes custom FSW
tokenization, I recreated and extended its functionality for JavaScript.
Bipin has further ported Amit's library to Flutter and Dart, adding
visualizations and achieving rendering speeds 3,000 times faster than
|sutton-signwriting/font-db|.
* Amit's Python library:
https://github.com/sign-language-processing/signwriting
* Bipin's Flutter library:
https://github.com/bipinkrish/signwriting-flutter
* Bipin's Dart library: https://github.com/bipinkrish/signwriting-dart
------------------------------------------------------------------------
Features of the Tokenizer
The tokenizer starts with *DEFAULT_SPECIAL_TOKENS*, commonly used in NLP
frameworks. These can be customized by modifying index numbers, value
strings, or adding new tokens.
Default tokens:
javascript
|DEFAULT_SPECIAL_TOKENS = [ { index: 0, name: 'UNK', value: '[UNK]' }, {
index: 1, name: 'PAD', value: '[PAD]' }, { index: 2, name: 'CLS', value:
'[CLS]' }, { index: 3, name: 'SEP', value: '[SEP]' } ]; |
Utility functions:
* Tokenize FSW: https://www.sutton-signwriting.io/core/#fswtokenize
* Detokenize FSW: https://www.sutton-signwriting.io/core/#fswdetokenize
* Chunk Tokens: https://www.sutton-signwriting.io/core/#fswchunktokens
The tokenizer generator creates an object with properties for encoding,
decoding, and vocabulary management:
https://www.sutton-signwriting.io/core/#fswcreatetokenizer
*Note*: The tokenizer currently supports Formal SignWriting in ASCII
(FSW). To use it with SignWriting in Unicode (SWU), convert to FSW first.
Thank you for reading!
–Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20241130/aa2b3c93/attachment.htm>
More information about the Sw-l
mailing list