Batch conversion of CHAT files to Praat TextGrids

Leonid Spektor spektor at andrew.cmu.edu
Fri Aug 22 16:30:24 UTC 2008


Fredrik,

    Those CHAT files do not pass check. Without actually seen the data file
I can only guess that bullets somehow got moved to the beginning of the
line. The only characters allowed at the beginning of the line are: * @ %
and tab. Try using "chstring" command, "chstring -q *.cha". This will
re-wrap tiers properly and will insert tabs where needed, but there could be
other errors that will cause other problems. The best thing to do is to run
"check *.cha" program on all the data and fix all reported errors.

    If you are still having problems send me a sample of the data and I'll
see what is wrong.

Leonid.



On 22-08-08 11:57, "Fredrik Karlsson" <dargosch at gmail.com> wrote:

> 
> Hi,
> 
> I whant to convert allt the .cha-files I downloaded from the ESF
> Corpus made public through the MPI archive system. I have tried CLANS
> chat2praat program, but it fails. This is the log:
> 
>> dir *.cha
> cha liean11b.1.cha liean12e.1.cha liean13g.1.cha
> liean14a.1.cha liean14a.2.cha liean16a.1.cha liean16k.1.cha
> liean17l.1.cha liean18a.1.cha liean18c.1.cha liean22a.1.cha
> liean22e.1.cha liean22g.1.cha liean23c.1.cha liean24a.1.cha
> liean24i.1.cha liean25a.1.cha liean25k.1.cha liean27l.1.cha
> liean31a.1.cha liean31d.1.cha liean32a.1.cha liean32e.1.cha
> liean32g.1.cha liean32h.1.cha liean34q.1.cha liean35j.1.cha
> test.cha unixfile.cha
> 
> 30 files, 0 directories
> 
>> chat2praat +e.wav liean11a.1.cha
> chat2praat +e.wav liean11a.1.cha
> Thu Aug 21 18:01:19 2008
> chat2praat (28-Jul-2008) is conducting analyses on:
> ALL speaker tiers
> and those speakers' ALL dependent tiers
> and ALL header tiers
> ****************************************
> From file <liean11a.1.cha> to file <liean11a.1.textGrid>
> 
> 
> *** File "liean11a.1.cha": line 16.
> Illegal speaker character found: 15.
> 
> CURRENT OUTPUT FILE "liean11a.1.textGrid" IS INCOMPLETE.
> 
> Line 16 of the file is:
> 
> ^U%snd: "liean11a.wav" 9383 13613^U
> 
> where ^U seems to be a \x0015. Removing the character with perl
> 
>> perl -nae 's/\x15//;print'  liean11a.1.cha > test_liean11a.1.cha
> 
> makes the file not readable for CLAN:
> 
> .....
> *** File "test_liean11a.1.cha": line 1481. <- Lots of messages like
> this one
> It is illegal to have Header Tiers '@' inside the transcript
> 
> 
>     NO BULLETS FOUND IN THE FILE
> 
> 
> Done with file <test_liean11a.1.textGrid>
> 
> So, what do I do? The chat2elan seems to behave the same way.
> 
> I would, or course, appreciate any help I could get.
> 
> /Fredrik
> > 
> 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Chibolts mailing list