Frequency of verb forms by verb type

Kevin Donnelly kevin at dotmon.com
Wed Oct 9 09:17:13 UTC 2013


Hi Leonid

::::On Wednesday 09 October 2013 Leonid Spektor said::::
> You are absolutely right that CHAT is not CLAN specific format. But, being
> a plain text format makes it prone to have a lot of extraneous data in
> between. For example if someone wants to look at a speaker tier or %mor
> tier only you would have to filter out the rest. 

The grep line I gave earlier does the speaker.  Adjust it a bit to do %mor:
grep '%mor' my_original_file.cha > mymor.cha
Again, the file should have its lines straightened first using LONGTIER or Sed.

> If you are looking for
> %mor tier of just one particular speaker only, then it becomes even more
> complicated. Using CLAN to filter unneeded data is the easiest solution.
> After that CHAT is just a plain text.

Sure, the more complicated your question, the more complicated your tools may 
have to be, and CLAN may do nearly everything you want out of the box, 
provided you get the switches right.  For your use-case, I would use the 
following 12-line PHP script:
===
 <?php

//Open a new file to write to.
$fp = fopen("mymor.txt", "w");

// Open the source file.
$lines=file("path/to/my/file.cha");
// Read through each line.
foreach ($lines as $line)
{
	// If it's a speaker line matching the speaker you want ...
	if (preg_match("/^\*HYW/", $line))
	{
		$getmor=1;  //  ... set a marker.
	}
	//  If it's a speaker line that doesn't match that speaker ...
	elseif (preg_match("/^\*[^(HYW)]/", $line))
	{
		$getmor=0;  // ... revert the marker.
	}
	
	// If it's a %mor line and the marker is set ...
	// (ie the last speaker was HYW) ...
	if (preg_match("/^%mor/", $line) and $getmor==1)
	{
		echo $line;  // ... show the line ...
		fwrite($fp, $line);  // ... write it to the new file ...
		$getmor=0;  // ... revert the marker.
	}
}

// Close the new file.
fclose($fp);

?>
===

Not everyone will want to do this, I know, but the benefit of using PHP (or 
grep, or Python, or R, or whatever) is that researchers can learn something 
they can re-use in other contexts, or where other tools fit in better with 
their workflow, or where CLAN, for all its versatility, can't produce what they 
need.

>         For those who do not want to use CLAN at all and still have an easy
> way to parse the data we have XML-CHAT on our server. Just look for "XML"
> in "Database" section on our web server's home page.

Hmm - I've always found XML harder to parse than text, but maybe that's just 
me! :-)

-- 
Pob hwyl / Best wishes

Kevin Donnelly
kevindonnelly.org.uk
bangortalk.org.uk

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/201310091017.13168.kevin%40dotmon.com.
For more options, visit https://groups.google.com/groups/opt_out.



More information about the Chibolts mailing list