Some problems with extracting error-free utterances and verbs from CHAT files

Leonid Spektor spektor at andrew.cmu.edu
Fri Jun 29 14:10:50 UTC 2018


In searches the "*" character is used as a wildcard to specify that you want to match zero or more characters. Because error codes have "*" character in them you want to tell CLAN that in this case you are looking for the actual "*" character and not wildcard. The "\" character means that the following character is literal character and not a wildcard, so "\**" means that you want for find a "*" character followed by any zero or more characters. This will let you match [*] and [* aux]. CLAN searches have three wildcard character: *, % and _ .


Leonid.

> On Jun 29, 2018, at 01:06, Li Zeng <zlmailhouse at 163.com> wrote:
> 
> Leonid, thank you very much for your response. I was wondering what is the use of  "\" within "trim -s"[+ \**]"" ?
> 
> Li
> 
> On Thursday, June 28, 2018 at 12:37:50 PM UTC-5, Leonid Spektor wrote:
> Li,
> 
> 	1. The codes like [*] or [* aux] refer to the word before them. If you want your codes refer to the whole utterance, then they need to start with "[+ ". You can change your codes to [+ *], [+ *aux], [+ *wh], then trim those utterances with command: trim -s"[+ \**]". 
> 
> 2. If you used the latest MOR grammar on your data, then you can comprehensive command option for all verbs is: +sm|v,|cop,|aux,|mod,|mod:*,|part
> 
> 
> Leonid.
> 
>> On Jun 28, 2018, at 12:48, Li Zeng <zlmai... at 163.com <javascript:>> wrote:
>> 
>> Hi there, 
>> 
>> I encounter some problems with extracting utterances/ verbs  in CHAT files. 
>> 
>> Firstly, I have tagged ungrammatical utterances of *CHI with either [*], [* aux] or [* wh]. Now I wanna calculate the number of utterances without those tags([*], [* aux], [* wh] as well as those containing www, yyy.  I tried using the following command:  trim -s"[*_ ]" +1 , only to find it turns out to be unsuccessful.
>> 
>> Secondly, I would like to extract all the verbs of CHI* (including copulers, modals, auxiliaries as well as regular verbs ) in the file. I find out that at%mor, "walking" is coded not as a verb but as "PART |" . In that case, I guess I need to  also include "PART|"  , right?  I was wondering what might be the comprehensive command to be used to extract all the verbs mentioned above?
>> 
>> Thank you. 
>> 
>> Li 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u... at googlegroups.com <javascript:>.
>> To post to this group, send email to chib... at googlegroups.com <javascript:>.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/addb310b-f4ed-497a-bd48-e1f91c045f53%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/addb310b-f4ed-497a-bd48-e1f91c045f53%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To post to this group, send email to chibolts at googlegroups.com <mailto:chibolts at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/2e04f306-9f2c-4e2c-a36a-ae9c15ead2f7%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/2e04f306-9f2c-4e2c-a36a-ae9c15ead2f7%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/4AD82A51-75BE-43B5-AB13-70287D2376A8%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20180629/b59f36a1/attachment.htm>


More information about the Chibolts mailing list