[Sw-l] Next Steps. Video to Mocap data for signing.

John Carlson yottzumm at gmail.com
Wed Aug 2 16:07:22 UTC 2023


Answers below:

On Wed, Aug 2, 2023 at 4:16 AM Amit Moryossef <amitmoryossef at gmail.com>
wrote:

> Hi John,
>
> 1. Sounds like you are looking into doing a rule-based pose-to-mocap
> transformation.
> The vast majority of previous work on this has shown that it does not work
> in a rule based, and one must train a neural network for this
> transformation.
>

I plan on using existing python packages I mentioned to perform the
conversion from video to geometry+transformations, with a little glue to
get it into BVH or HAnim+BVH.  If these python packages are rule-based,
then I need to reconsider.  I know rule-based systems do not work for the
most part.  I don't really have a lot of experience with neural networks or
video capture and hope to leverage other's work. If the python package I
mentioned uses rule-based systems, I will consider alternatives.  I have
just a little bit of experience.  I think there's a large leap from
geometry+transformations to language.  That's the challenge. I hope to do
translation from language to geometry+transformations as well. This is
where encoders for geometry+transformations come in. Ultimately, I view
geometry+transformations as a language (HAnim, X3D, BVH), so I'll be doing
language to language translation, or as you say, sequence-to-sequence
(perhaps in a larger, tree-to-tree solution.  Trees may be encoded as
sequences.)


> 2. SignTube will soon (always, hopefully) be able to transcribe videos in
> SignWriting automatically. The quality will not be great (at first). That
> too will be using a neural network, specifically, a VQVAE to encode the
> video, and a sequence-to-sequence translation model to write the
> SignWriting.
>

Thank you for any information you have on VQVAE.  This looks like a good
resource: https://keras.io/examples/generative/vq_vae/.

>
> 3. If you want to generate videos directly from SignWriting, this work
> <https://rotem-shalev.github.io/ham-to-pose/> would be a good starting
> point, working from HamNoSys.
>

I'm not targeting video output at this time. I am targeting BVH+HAnim.
Then video will be possible, but not my job, except for validation.  My
target intended audience is the deafblind, so robotic control of
mannequins--SignWriting is not an option.  I believe the company vcom3d has
robotic controlled mannequins, and I've been unable to contact them online
at public email addresses or web contact forms.

If SignWriting can help on this project, then that would be a big bonus!.

John


> Amit
>
>
> On Wed, Aug 2, 2023 at 6:53 AM John Carlson <yottzumm at gmail.com> wrote:
>
>> I need a large collection of signing videos to run an experiment
>> converting video geometry. I do not particularly have large drives to do
>> this, so I may rent space on a cloud service.
>>
>> I plan to use python packages cv2 (openCV)
>> https://pypi.org/project/opencv-python/, cvzone
>> https://github.com/cvzone/cvzone, and MediaPipe
>> https://developers.google.com/mediapipe/solutions/guide to convert video
>> files into geometry and transformations, either BVH (BioVision Hierarchy)
>> or some other mocap format (HAnim+BVH?).  That is, we are converting signs
>> and body language to line segments and points, and ultimately sets of
>> geometry and transformations, and then translating those to something like
>> English. I do not know if facial expressions are really recognizable or
>> not.  I may try my hand at lipreading video, IDK.  If the video has sound,
>> we'll transcribe that.
>>
>> Ideally, I'll be able to store geometry, transformations and translation
>> (possibly achieved by transcribing sound or lipreading) along with links to
>> a video URL.  The step after that is to find a translation from geometry
>> and transformations to English, and back.
>>
>> An acquaintance suggested that depth was required but not available,
>> Elon Musk says depth is not required for autonomous driving.  IDK, but I
>> want to find out.
>>
>> If anyone has already tried this, let me know.  It would be interesting
>> to convert geometry to SignWriting as well.
>>
>> I am not sure if SignTube does this automatically, or if it uses human
>> transcribers.
>>
>> Any knowledge of a media solution or publically available database that
>> links all this data would be helpful, too.
>>
>> If someone wants to provide assistance on this effort, let me know.
>>
>> John
>> _______________________________________________
>> Sw-l mailing list
>> Sw-l at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/sw-l
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/sw-l/attachments/20230802/8401688d/attachment.htm>


More information about the Sw-l mailing list