[Sw-l] Next Steps. Video to Mocap data for signing.

Wed Aug 2 09:15:52 UTC 2023

Hi John,

1. Sounds like you are looking into doing a rule-based pose-to-mocap
transformation.
The vast majority of previous work on this has shown that it does not work
in a rule based, and one must train a neural network for this
transformation.

2. SignTube will soon (always, hopefully) be able to transcribe videos in
SignWriting automatically. The quality will not be great (at first). That
too will be using a neural network, specifically, a VQVAE to encode the
video, and a sequence-to-sequence translation model to write the
SignWriting.

3. If you want to generate videos directly from SignWriting, this work
<https://rotem-shalev.github.io/ham-to-pose/> would be a good starting
point, working from HamNoSys.

Amit

On Wed, Aug 2, 2023 at 6:53 AM John Carlson <yottzumm at gmail.com> wrote:

> I need a large collection of signing videos to run an experiment
> converting video geometry. I do not particularly have large drives to do
> this, so I may rent space on a cloud service.
>
> I plan to use python packages cv2 (openCV)
> https://pypi.org/project/opencv-python/, cvzone
> https://github.com/cvzone/cvzone, and MediaPipe
> https://developers.google.com/mediapipe/solutions/guide to convert video
> files into geometry and transformations, either BVH (BioVision Hierarchy)
> or some other mocap format (HAnim+BVH?).  That is, we are converting signs
> and body language to line segments and points, and ultimately sets of
> geometry and transformations, and then translating those to something like
> English. I do not know if facial expressions are really recognizable or
> not.  I may try my hand at lipreading video, IDK.  If the video has sound,
> we'll transcribe that.
>
> Ideally, I'll be able to store geometry, transformations and translation
> (possibly achieved by transcribing sound or lipreading) along with links to
> a video URL.  The step after that is to find a translation from geometry
> and transformations to English, and back.
>
> An acquaintance suggested that depth was required but not available,  Elon
> Musk says depth is not required for autonomous driving.  IDK, but I want to
> find out.
>
> If anyone has already tried this, let me know.  It would be interesting to
> convert geometry to SignWriting as well.
>
> I am not sure if SignTube does this automatically, or if it uses human
> transcribers.
>
> Any knowledge of a media solution or publically available database that
> links all this data would be helpful, too.
>
> If someone wants to provide assistance on this effort, let me know.
>
> John
> _______________________________________________
> Sw-l mailing list
> Sw-l at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/sw-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20230802/0494f96d/attachment-0001.htm>