question about batchalign2
Janet Bang
janet.bang at gmail.com
Thu May 23 17:23:00 UTC 2024
Hi Brian,
Yes! What we've done follows the 16.1 conventions using precedes for whole
utterances and @s for the single words for intra-utterance switching (e.g.,
@s:yue, @s:spa for Cantonese or Spanish, respectively). Would Batchalign be
able to recognize these codes?
Thanks!
Janet
On Thu, May 23, 2024 at 7:17 AM Brian Macwhinney <macw at andrew.cmu.edu>
wrote:
> Janet,
>
> It seems that I didn’t understand your question. If you are talking about
> tagging using the UD taggers used by Batchalign, then I believe this is
> possible, but Houjun will need to confirm. However, you would have to mark
> each utterance in a CHAT file with the language tag if it were not the
> primary language of the file. Please take a look at section 16.1 of the
> CHAT manual about that type of coding.
>
> —Brian
>
> > On May 23, 2024, at 1:54 AM, Janet Bang <janet.bang at gmail.com> wrote:
> >
> > Hi Brian,
> >
> > Thanks for the quick response. We are still in the world of transcripts
> but I definitely look forward to the day when we can have multilingual ASR!
> >
> > Since Houjun mentioned that intra utterance code switching wasn’t yet
> available, would you recommend that we first run batch align and then do
> the code switched utterances by hand? We don’t have many for now and were
> still working out some processes, but thinking of what we could build up
> moving forward.
> >
> > Janet
> >
> > On Wednesday 22 May 2024, Brian Macwhinney <macw at andrew.cmu.edu> wrote:
> > Dear Janet,
> > Not yet, I am afraid. As my colleague Houjun Liu puts it
> “code-switching multilingual ASR is still an active and unstable area of
> research”.
> >
> > — Brian MacWhinney
> >
> > > On May 22, 2024, at 6:28 PM, Janet Bang <janet.bang at gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I am currently working with transcripts that are multilingual (e.g.,
> English/Spanish, English/Korean). They are around 70 - 100 utterances or so
> of parent-reported first words/phrases for children between 12 - 26 months,
> so they are around 1 - 3 words per utterance, but occasionally longer. We
> have asked parents to report what their child said across multiple days, in
> whichever language they used.
> > >
> > > We would like to extract lemmas and consider unilemmas (e.g., Mommy,
> Mamá - Spanish, 어마 - Korean) both across children who speak different
> languages and within a child who might use multiple languages. To
> facilitate this I was wondering if batchalign would work with multilingual
> transcripts?
> > >
> > > Thank you!
> > > Janet
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "chibolts" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to chibolts+unsubscribe at googlegroups.com.
> > > To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/CAC5V4hg5uf152so6ALFk9RyNR_aX6uFb0jEsbH7yYKs2utHD2A%40mail.gmail.com
> .
> >
>
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAC5V4hjkKpAmzGhzwnW%3D%2BEY5bz6LwrZaGkFW-Fe69o17erdT9w%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20240523/49146136/attachment-0001.htm>
More information about the Chibolts
mailing list