extracting utterances from specified tier ID

Leonid Spektor spektor at andrew.cmu.edu
Mon Jul 7 20:40:07 UTC 2025


Hi Mingyu,

The +/-t options convention is following:

+/-t%mor	- include or exclude all %mor utterances.

The * (star) right after "+/-t" is a literal star. All speaker tiers start with a star character, i.e. *PAR: text.

+/-tPAR - is a short cut, if the star is missing after the "t", then it is assumed you want speaker tiers.

+/-t*PAR - no short cuts, just explicit way to specify specific speaker tier.

+/-tPAR* - star at the end means wild character that matches anything there is.
		Some corpuses have speakers *PAR-one: and  *PAR-two: and so on.

+/-t*PAR* - the same a above, just more explicit.

Hope this helps,

Leonid.

> On Jul 7, 2025, at 15:58, 'Mingyu Yuan' via chibolts <chibolts at googlegroups.com> wrote:
> 
> Hi everyone, 
> 
> I have a question about extracting participants' utterances using CLAN commands and was wondering if I'm thinking along the right lines. I'd appreciate it if you could take a look. Thanks!
> 
> I'm working with DementiaBank, specifically the ADReSS dataset, a subset of the Pitt corpus. I used the following command to extract the 'flow' tier of participants' utterances: `flo +cr +tPAR*`. Here, I have the asterisk * placed after the PAR identifier. But I noticed that in the CLAN manual, the asterisk typically precedes it, as in `t*PAR`. 
> 
> I got the following output after running `t*PAR`
> 
> flo (13-Apr-2023) is conducting analyses on:
>   ONLY speaker main tiers matching: *PAR;
> 
> And here's the output after running `tPAR*`
> 
> flo (13-Apr-2023) is conducting analyses on:
>   ONLY speaker main tiers matching: *PAR*;
> 
> It looks like the asterisk is used to search for tier ID patterns. Since all my files contain only INV and PAR tiers, I assume tier matching would only affect the selection of the PAR tier. I also used a Python function to verify that the utterances extracted by these two commands were identical (attached below, in case it's helpful). 
> 
> Both commands appear to work, but I don't fully understand why. Please let me know your thoughts. Thank you very much!
> 
> Best,
> Mingyu
> 
> def check_clan_command(id, file_old, file_new):
>     # Read the .cex file created by the old command (i.e. with tPAR*)
>     with open(PATH_TO_OLD_FILE, 'r') as file_old_cmd:
>         file_o = file_old_cmd.read().splitlines()
>     # Read the .cex file created by the new command (i.e. with t*PAR)
>     with open(PATH_TO_NEW_FILE, 'r') as file_new_cmd:
>         file_n = file_new_cmd.read().splitlines()
>     print(id, file_o == file_n)
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/chibolts/d2459e0d-41c6-4707-9e00-e75f5e755c47n%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/d2459e0d-41c6-4707-9e00-e75f5e755c47n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/chibolts/870AA7C7-E69A-4C58-9361-121EB70D3192%40andrew.cmu.edu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20250707/d89e7778/attachment-0001.htm>


More information about the Chibolts mailing list