extracting utterances from specified tier ID
'Mingyu Yuan' via chibolts
chibolts at googlegroups.com
Mon Jul 7 21:41:35 UTC 2025
Hi Leonid,
This is helpful! Thank you for the clarification. It looks like tPAR and
t*PAR are what I intended to use. As for tPAR*, the wild character at the
end matches anything that might follow PAR, as I understand it. Does it
also match 'nothing', i.e. the tier name is exactly PAR? Thank you!
Best,
Mingyu
On Monday, July 7, 2025 at 1:46:29 PM UTC-7 Leonid Spektor wrote:
> One more thing about +/-t options. The name(s)/code(s) are not case
> sensitive.
>
> For more short cut "+tpar" is the same as "+t*PAR:"
>
>
> Leonid.
>
> On Jul 7, 2025, at 16:40, Leonid Spektor <spe... at andrew.cmu.edu> wrote:
>
> Hi Mingyu,
>
> The +/-t options convention is following:
>
> +/-t%mor - include or exclude all %mor utterances.
>
> The * (star) right after "+/-t" is a literal star. All speaker tiers start
> with a star character, i.e. *PAR: text.
>
> +/-tPAR - is a short cut, if the star is missing after the "t", then it is
> assumed you want speaker tiers.
>
> +/-t*PAR - no short cuts, just explicit way to specify specific speaker
> tier.
>
> +/-tPAR* - star at the end means wild character that matches anything
> there is.
> Some corpuses have speakers *PAR-one: and *PAR-two: and so on.
>
> +/-t*PAR* - the same a above, just more explicit.
>
> Hope this helps,
>
> Leonid.
>
> On Jul 7, 2025, at 15:58, 'Mingyu Yuan' via chibolts <
> chib... at googlegroups.com> wrote:
>
> Hi everyone,
>
> I have a question about extracting participants' utterances using CLAN
> commands and was wondering if I'm thinking along the right lines. I'd
> appreciate it if you could take a look. Thanks!
>
> I'm working with DementiaBank, specifically the ADReSS dataset, a subset
> of the Pitt corpus. I used the following command to extract the 'flow' tier
> of participants' utterances: `flo +cr +tPAR*`. Here, I have the asterisk *
> placed after the PAR identifier. But I noticed that in the CLAN manual, the
> asterisk typically precedes it, as in `t*PAR`.
>
> I got the following output after running `t*PAR`
>
> flo (13-Apr-2023) is conducting analyses on:
> ONLY speaker main tiers matching: *PAR;
>
> And here's the output after running `tPAR*`
>
> flo (13-Apr-2023) is conducting analyses on:
> ONLY speaker main tiers matching: *PAR*;
>
> It looks like the asterisk is used to search for tier ID patterns. Since
> all my files contain only INV and PAR tiers, I assume tier matching would
> only affect the selection of the PAR tier. I also used a Python function to
> verify that the utterances extracted by these two commands were identical
> (attached below, in case it's helpful).
>
> Both commands appear to work, but I don't fully understand why. Please let
> me know your thoughts. Thank you very much!
>
> Best,
> Mingyu
>
> def check_clan_command(id, file_old, file_new):
> # Read the .cex file created by the old command (i.e. with tPAR*)
> with open(PATH_TO_OLD_FILE, 'r') as file_old_cmd:
> file_o = file_old_cmd.read().splitlines()
> # Read the .cex file created by the new command (i.e. with t*PAR)
> with open(PATH_TO_NEW_FILE, 'r') as file_new_cmd:
> file_n = file_new_cmd.read().splitlines()
> print(id, file_o == file_n)
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+u... at googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/chibolts/d2459e0d-41c6-4707-9e00-e75f5e755c47n%40googlegroups.com
> <https://groups.google.com/d/msgid/chibolts/d2459e0d-41c6-4707-9e00-e75f5e755c47n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/chibolts/002d8535-e30c-4678-9f56-37f9c959afa7n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20250707/0ce6801b/attachment.htm>
More information about the Chibolts
mailing list