Esc +L and [mor- xb] Not Picking Up Errors

Leonid Spektor spektor at andrew.cmu.edu
Thu Oct 3 11:01:42 UTC 2024


Question 1
Unfortunately There is no way to avoid making typos.


Question 2
There are two ways to check for typos. One is with command:

freq +s"[\* *]" +u +o3 *.cha

This command will combine all error codes found in all transcripts specified by *.cha on all speaker utterances into one small list. This assumes that all your transcripts are in one working directory. If your transcripts are in multiple sub-directories, then set working directory to the top directory level and add +re option to FREQ command. You can also add +d option to see which file and utterance a particular code is located on. The +d2 option just adds unnecessary complexity.


The other way is to list only legal error codes in "depfile.cut" that CHECK and ESC-L commands use to validate CHAT files. The "depfile.cut" file is located in "CLAN/lib" directory of your CLAN distribution. Inside the "depfile.cut" look for [\* _*] string. This allow any error code by way of "_*" symbols, i.e. star (*) character means any string combination. You can edit "depfile.cut" and replace the [\* _*] string with just error codes that you want to allow, i.e. [\* p:w] [\* p:n] [\* p:m] . There are a lot of error codes in CLAN CHAT manuals. You can list all of them or just few that are pertinent to your corpus. After you save the "depfile.cut" file. CHECK and ESC-L commands will report an error if any other error codes are found in your transcripts. 

We do not recommend changing "depfile.cut", because it is easy to make mistake that will allow errors in transcripts. Also, because your copy of "depfile.cut" will be replaced by standard distribution copy if and when you update CLAN on your computer.


Question 3
The difference between your command and mine is that your command has two "\" characters that are redundant. Also, your command will only check *PAR speakers. It will not detect errors on other speaker's utterances.


Leonid.

> On Oct 3, 2024, at 04:20, Sophie Brook <sophiemeibrook at gmail.com> wrote:
> 
> Thank you for your swift and helpful response! 
> 
> 
> Yes you have understood correctly :) 
> In this case, we don't want to create new error codes (because any "new codes" is a result of typos as opposed to purposely wanting to create new codes). 
> 
> Question 1
> So to check I have understood correctly, the way to make sure error codes are only from the CLAN manual is by careful transcription (aka not making typos)?
> 
> Question 2
> We can catch error codes we do not want in our transcripts but the way to check for unwanted codes is to manually look at this spreadsheet output made by the code above? 
> 
> Question 3
> Is there a difference between the code you've typed (freq +s"[\* *]") and the one I typed freq +s"\[\* *\]" +t*PAR * +d2 ?
> 
> Thanks. 
> 
> Sophie
> (she/her/hers) 
> 
> On Wednesday, October 2, 2024 at 12:22:58 PM UTC+1 Leonid Spektor wrote:
>> Hi Sophie,
>> 
>> 	I don't understand what the problem is. Is it that FREQ command finds error code that you are not looking for or is it that data files have error codes that you consider illegal?
>> 
>> 	CLAN does not check spelling of error codes. It assumes that if you have [* ...], then you want it to be an error code. It is up to transcriber to follow valid error codes convention and to use only valid error codes in transcripts.
>> 
>> This approach allows transcriber to create new error codes that are specific to their transcription and to not be limited by just few choice error codes. If you run command (freq +s"[\* *]"), similar to the one you have in your email, then you can catch any error codes that you do not want in your transcript.
>> 
>> 
>> 
>> Leonid.
>> 
>> 
>>> On Oct 2, 2024, at 06:51, Sophie Brook <sophiem... at gmail.com <>> wrote:
>>> 
>> 
>>> Hi there, 
>>> 
>>> What I did
>>> I used this code: 
>>> freq +s"\[\* *\]" +t*PAR * +d2
>>> 
>>> to output frequencies of different errors e.g., phonological [* p:w] [* p:n] , morphological, semantic and neologisms etc etc 
>>> 
>>> The problem
>>> 
>>> I found mistypes within the errors in the output eg [* p:nw] was not found to be an error even though this error code is not a valid one recognised by CLAN. Or another example, missing square brackets and colons in error codes. 
>>> 
>>> I have been manually changing the CHAT files after combing through the columns to see which transcriptions have found these "illegal error codes". 
>>> 
>>> 
>>> 
>>> Question 
>>> 
>>> Why is this happening?
>>> 
>>> Is there a way around this?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Thank you!!! :) 
>>> 
>>> Best, 
>>> 
>>> Sophie Brook 
>>> 
>>> (she/her) 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u... at googlegroups.com <>.
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ae6e75f8-9e56-4e2c-9f55-82d1fc59bee3n%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/ae6e75f8-9e56-4e2c-9f55-82d1fc59bee3n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/a871e343-d13d-4312-a964-a4428448dce9n%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/a871e343-d13d-4312-a964-a4428448dce9n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ED2AC3C0-4584-49C1-8FE0-CAC0F8CE63CD%40andrew.cmu.edu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20241003/3ef92c1b/attachment.htm>


More information about the Chibolts mailing list