Advice on time-stamping existing transcriptions in CLAN/ELAN

Brian Macwhinney macw at andrew.cmu.edu
Thu Dec 15 22:10:28 UTC 2022


Rachel,
    The F5 method inserts the begin and end times of the utterances.  I have found that I can fully link a 30-minute transcript using F5 in one hour.  It will not handle overlapping speech well.  However, at that point you can send the utterance-level diarized transcripts to our batchalilgn program using the -prealigned option and then it will do a better job on everything.  Some overlaps will still be resistant.  For those that involve the Investigator simply saying “yeah” “right’ or “uhhuh” in the middle of a longer participant utterances, it is best to use the &*MOT: yeah code discribed in section 9.11.2 of the CHAT manual.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology, 
Language Technologies and Modern Languages, CMU 

> On Dec 15, 2022, at 4:05 PM, Rachel Romeo <romeo at umd.edu> wrote:
> 
> Thanks Brian! I wasn't aware of batchalign. This sounds like a great option for fine-tuning F5 bullets. We were hoping to avoid manually adding bullets, but it is appearing more and more likely that it will be necessary. My one concern is that (to my knowledge) only gives us utterance onsets and not offsets. We are particularly interested in overlapping speech, so knowing utterance offsets is key. Since batchalign can give word-level timestamps, do you know how it fares with overlapping speech?
> 
> On Thu, Dec 15, 2022 at 11:10 AM Brian Macwhinney <macw at andrew.cmu.edu> wrote:
> Rachel,
>     You might try our new batchalign method at https://github.com/talkbank However, this pipeline is designed for two cases different from yours.  
> 
> The first is full ASR with diarization from raw audio, but assuming that the audio is good.  We have been using this with adult audio for TBIBank and DementiaBank, but not for child language.  
> 
> The second case is for transcripts with time-alignments on the utterance level.  We used this with everything in AphasiaBank with much success.  It not only provides more accurate utterance-level times, but also complete word level times.
> 
> Unfortunately, your case is neither of these.  We may try creating some “fake” utterance-level times to use the second method, but the alternative is for you to use the F5 transcriber mode in CLAN to insert the utterance-level time marks.
> 
> — Brian MacWhinney
> Teresa Heinz Professor of Cognitive Psychology, 
> Language Technologies and Modern Languages, CMU 
> 
> > On Dec 15, 2022, at 10:37 AM, Rachel Romeo <romeo at umd.edu> wrote:
> > 
> > Hi! I can't share it with the whole list, but Liebny I can send you an example separately if you'd like. 
> > 
> > Videos are ~25 mins long, but just to make things extra spicy, only 15 mins are transcribed (we HAVE timestamped the start/stop of the transcription, but no utterances within). 
> > They are in English, with maybe an occasional word or two in Spanish. 
> > They are probably less noisy than a LENA recording (no shirt rustling), but the audio quality is not excellent. Audio was recorded from video cameras rather than microphones. Occasional toy banging/buzzing/ringing. 
> > 
> > -Rachel
> > 
> > On Thu, Dec 15, 2022 at 10:11 AM Leibny Garcia <leibny at gmail.com> wrote:
> > Hello Rachel, 
> > 
> > Can you share an example? How long are the audios? Are they super noisy? Is it English?
> > 
> > Cheers,
> > Paola
> > 
> > 
> > On Thu, Dec 15, 2022 at 9:02 AM Rachel Romeo <romeo at umd.edu> wrote:
> > Hello fellow DARCLERS!
> > 
> > I'm hoping the whizzes on this list might help us save a boatload of time. We have a very large corpus of beautifully CHAT-transcribed naturalistic parent-child interactions (from lab-based videos, not daylong recordings). We would like to ELAN-ify it, however unfortunately, none of it is time-stamped (womp womp). Does anyone have any experience/ideas for automating this, or at least speeding it up? There is a lot of overlapping speech and the world's loudest toys, so scripts to identify pauses/silences are likely to be only marginally helpful. We have not experimented with any forced alignment tools yet, so if anyone has suggestions for something that works well with these kinds of data, we are more than willing to try. 
> > 
> > Happy to report back any useful tips!
> > 
> > Thanks all, and happy holiday season!
> > Rachel 
> > 
> > -- 
> > Rachel R. Romeo, PhD, CCC-SLP
> > Assistant Professor 
> > Department of Human Development and Quantitative Methodology
> > Department of Hearing and Speech Sciences, by courtesy
> > Program in Neuroscience and Cognitive Science
> > University of Maryland College Park
> > education.umd.edu/leadlab
> > Phone: 301-405-2809
> > Pronouns: she/her/hers
> > 
> > -- 
> > Learn more about DARCLE's members and ongoing projects: www.darcle.org
> > Read our meeting summaries: https://docs.google.com/document/d/1kEcYYIISBEyi46KE7s21GOg6Y_YgzFgIOC_o5mMaCWg/edit?usp=sharing
> > Join us for our next meeting https://calendar.google.com/calendar/embed?src=hj2gafr0qepsjtlgvjk0ic0pos%40group.calendar.google.com&ctz=Europe/Paris
> > --- 
> > You received this message because you are subscribed to the Google Groups "DARCLE" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to darcle+unsubscribe at googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/darcle/CALbyt0c0xV-yCS5cVQDvT%2BzDXFaarK3JN7rbAz8N%2B76Ch53bwQ%40mail.gmail.com.
> > 
> > -- 
> > Learn more about DARCLE's members and ongoing projects: www.darcle.org
> > Read our meeting summaries: https://docs.google.com/document/d/1kEcYYIISBEyi46KE7s21GOg6Y_YgzFgIOC_o5mMaCWg/edit?usp=sharing
> > Join us for our next meeting https://calendar.google.com/calendar/embed?src=hj2gafr0qepsjtlgvjk0ic0pos%40group.calendar.google.com&ctz=Europe/Paris
> > --- 
> > You received this message because you are subscribed to the Google Groups "DARCLE" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to darcle+unsubscribe at googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/darcle/CAGa%2BdhhbmYAq7m3Ka9Q7up8fdJmG8g%3D50NNj7qmzZwesUA7wLg%40mail.gmail.com.
> > 
> > 
> > -- 
> > Rachel R. Romeo, PhD, CCC-SLP
> > Assistant Professor 
> > Department of Human Development and Quantitative Methodology
> > Department of Hearing and Speech Sciences, by courtesy
> > Program in Neuroscience and Cognitive Science
> > University of Maryland College Park
> > education.umd.edu/leadlab
> > Phone: 301-405-2809
> > Pronouns: she/her/hers
> > 
> > -- 
> > Learn more about DARCLE's members and ongoing projects: www.darcle.org
> > Read our meeting summaries: https://docs.google.com/document/d/1kEcYYIISBEyi46KE7s21GOg6Y_YgzFgIOC_o5mMaCWg/edit?usp=sharing
> > Join us for our next meeting https://calendar.google.com/calendar/embed?src=hj2gafr0qepsjtlgvjk0ic0pos%40group.calendar.google.com&ctz=Europe/Paris
> > --- 
> > You received this message because you are subscribed to the Google Groups "DARCLE" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to darcle+unsubscribe at googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/darcle/CALbyt0daYk578F9sjW3%3DFL2WXZhZDWn%3DZBuN29h3GWszt-zStw%40mail.gmail.com.
> 
> -- 
> Learn more about DARCLE's members and ongoing projects: www.darcle.org
> Read our meeting summaries: https://docs.google.com/document/d/1kEcYYIISBEyi46KE7s21GOg6Y_YgzFgIOC_o5mMaCWg/edit?usp=sharing
> Join us for our next meeting https://calendar.google.com/calendar/embed?src=hj2gafr0qepsjtlgvjk0ic0pos%40group.calendar.google.com&ctz=Europe/Paris
> --- 
> You received this message because you are subscribed to the Google Groups "DARCLE" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to darcle+unsubscribe at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/darcle/C0A627FA-8F63-49CC-80D2-1D4D8BC110E9%40andrew.cmu.edu.
> 
> 
> -- 
> Rachel R. Romeo, PhD, CCC-SLP
> Assistant Professor 
> Department of Human Development and Quantitative Methodology
> Department of Hearing and Speech Sciences, by courtesy
> Program in Neuroscience and Cognitive Science
> University of Maryland College Park
> education.umd.edu/leadlab
> Phone: 301-405-2809
> Pronouns: she/her/hers

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/2741CFE5-C2FA-4406-B970-7E7458040E80%40andrew.cmu.edu.


More information about the Chibolts mailing list