bilingual

Leonid Spektor spektor at andrew.cmu.edu
Tue Dec 31 12:17:36 UTC 2013


Dear Ying,

	First I would suggest that you use +/-s" [- zho]" option with both MOR and POST commands:

mor +s"[- zho]" sample_English.cha +1
post +s"[- zho]" sample_English.cha +1
mor -s"[- zho]" sample_English.cha +1
post -s"[- zho]" sample_English.cha +1


Here are answers to your questions:

1. You can fix errors in any order you like, as long as in the end CHECK reports no errors found. If you are using ESC-L CHECK, then you will not have a choice of ignoring the first error found, because ESC-L CHECK always starts from the top of the file. It has no continue from current location option.

2. if you use "park at s" instead of "park at s$n", then MLU will give correct result.
The TTR results could be wrong depending on your FREQ command. If you only run FREQ with +s"[- zho]" or
-s"[- zho]" options, then result will be correct with either "park at s" or "park at s$n" choice. If you run FREQ without
+/-s"[- zho]" options, then you will force FREQ to compare words, for example, "park at s" and "park", which are not same, and it will inflate the TTR result. If you use "park at s$n" choice and run FREQ on %mor tier, i.e use "+t%mor -t*" options, then result will be more accurate.

3. Your sample file had "[*tense]" instead of "[* tense]" code, notice missing space character. Maybe that was the cause for failure to find "[* tense]" code. I have added the space character and got correct result searching for "[* tense]" code with this command:

freq +s"[* tense]" sample_English.cha

If you want to count the actual words associated with code "[* tense]", then use this command:

freq +s"<* tense>" sample_English.cha


I hope this helps and Happy New Year!


Leonid.



On Dec 30, 2013, at 18:46 , Ying <yl5834 at gmail.com> wrote:

> Dear Leonid,
> 
> I want to get MLU, the number of different words, and also TTR from some Mandarin(Putonghua)-English bilingual narrative data. Also I added some word and utterance level codes and want to summarize the codes. For example, for the following sample (I am attaching the transcript after running the commands),
> 
> *CHI: [- zho] 我 去 了 一 个 <一个> [/]  park at s yesterday at s. [+ CS]
> *CHI: It is a very big one.
> *EXA: Nice.
> *CHI: My mom say [* tense] “We will come from time to time”. [+ GE]
> Note: 
> The precode [- zho] is for Mandarin/Putonghua, as [- yue] is for Cantonese
> [+ CS] is an utterance level code, indicating code-switched sentences
> [+ GE] is an utterance level code, indicating sentences with grammatical errors
> [* tense] is a word level code, indicating a tense error
> 
> Here are the commands I used:
> mor +s"[- zho]" sample_English.cha +1
> post sample_English.cha +1
> mor -s"[- zho]" sample_English.cha +1
> post sample_English.cha +1
> Esc_L
> freq +s"[% *]" *.cha
> 
> Questions I have:
> (1) May I ignore an error and move to the next one when I run CHECK?
> (2) for code-switched words within an utterance, I don't care for mor info such as noun or verb. But I do want to calculate MLU and TTR. If I go with park at s but don't bother to make park at s$n, will CLAN give me the correct results? 
> (3) I can get codes [zho], [CS], and [GE] calculated using FREQ, but not [* tense]. How may I count the occurance of [* tense]. Moreover, can I know whether it is the same verb (e.g., say) coded [* tense]?
> 
> Thank you very much! 
> Happy New Year!
> 
> Sincerely,
> Ying
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To post to this group, send email to chibolts at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/6de36ec6-5c23-41d7-a0b5-06f5b35a5ce2%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> <sample_English.cha>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/DBB212D9-F6B5-497A-A88B-2E563A04C7F4%40andrew.cmu.edu.
For more options, visit https://groups.google.com/groups/opt_out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20131231/d7b4ad50/attachment.htm>


More information about the Chibolts mailing list