How to search for Chinese signs when transcribers randomly added spaces in the transcript?

Leonid Spektor spektor at andrew.cmu.edu
Thu Aug 21 13:10:41 UTC 2025


Eva,

If there many spaces between words, then you can use "chstring -q +d *.cha" command remove extra spaces. 

After that if there are still unwanted space characters between signs, then those spaces need to be removed too. CLAN completely relies on spaces to determine how to separate text into words. I am not familiar with Chinese language, so I can't give you specific suggestion. However, CHSTRING command can find space character and remove them if necessary. For example to remove a space character before a particular sign you can use command:

chstring -w +s" s" "s" *.cha

In above example I am using letter 's' to represent plural Chinese sign that you might want to join with previous adjacent sign to create plural word without space character between signs.

Hopefully some with knowledge of Chinese language might give you better advice.


Leonid.

> On Aug 21, 2025, at 04:52, Eva Berglund <eva.berglund at gmail.com> wrote:
> 
> Hello,
> 
> I am right now analyzing Mandarin transcripts for third-person pronouns and for instance some plurals are written 他們, however I have noticed that some transcribers have added one or maybe many spaces between the signs and thus they are not counted as they should when I use FREQ. Is it possible to write some kind of command in CLAN to find the instances with one or many spaces so plurals are counted as they should?
> 
> Best regards
> 
> Eva Berglund
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/chibolts/ABB3C942-A4CF-4345-88FE-67BC555C23CF%40gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/chibolts/91AF3F57-7626-4ADB-83DA-19E9ADB57A8A%40andrew.cmu.edu.


More information about the Chibolts mailing list