Freq on French MOR tiers

Leonid Spektor spektor at andrew.cmu.edu
Tue Jun 22 14:13:57 UTC 2010


Florence,

	I have already sent the answer to this email to your account directly, but in case you did not get it, I am posting the following answer again on chibolts too.

	The '-' and '+' do not mean exclude and include respectively. The '-' was used as a separator between the element of mor format, such as '|', part of speech, and the string pattern, such as "aux". For example, to search for part of speech "det", you would specify "|-det". Later it became clear that separator is not necessary, so it became optional. That is why in some examples '-' is specified and in others it is omitted. I was trying to use this difference to indicate that the '-' character is optional. I guess I did not do a very good job.

The '+' between mor element and the search pattern means search for that pattern in addition to other matches. For example, if you have input:

*MOT:	oh (.) that's some toys .
%mor:	co|oh pro:dem|that~v:cop|be&3S qn|some n|toy-PL . 
*MOT:	that's a toy .
%mor:	pro:dem|that~v:cop|be&3S det|a n|toy . 

and if you want to find words "toy" and "toys" at the same time you would specify "+s@|n,-+*" or "+s@|n,-+PL", to be more specific. In this case output would be:

 1 n|toy
 1 n|toy-PL

But, if you specify "+s@|n,-*", "+s@|n,--*" or "+s@|n,-PL", then you'll get only:

 1 n|toy-PL


The answer to your 2) is that "word" in this case means search pattern. Again, this probably could've been worded better. But, the idea is that search patter could be either wild card like "*" and "%" or it could be a string such as in the example above "PL".


	The example "v|verb at replaced_word", that you give below, is a result of some people wanting to see if a word on mor tier is a replacement word or an original word spoken by the speaker and what the original word spoken by the speaker was. CLAN does not provide an easy link between the mor tier and speaker tier when searching for items on mor tier. The use of "@replaced_word(s)" is used to provide that information. If you don't want to see this in your output, then you can use the "o%" or "o-%" element in "+s"@r*,|*,o%" search option. If you want to only exclude the "@replaced_word" element, then you can specify this option "+s"@r*,|*,-+*,&+*,#+*,=+*,o%".

I hope this provides better understanding of this feature.

Leonid.





On Jun 21, 2010, at 06:02, Florence Chenu wrote:

> Dear Leonid,
> 
> I begin to have fun with the new feature but I can't figure out how I can
> get rid of replacement word presiding [: ...] code marker (in my results, I
> have sometimes v|verb at replaced_word).
> 
> And your help text is sometimes puzzling to me:
> 
> 1) what does a + or a - ? ("followed by - or + and/or the following...")
> 
> 2) what does " word -find "word"" mean ?
> 
> 3) in your example :
>   " +t%mor -t* +s"@r*,|adv,o%"
>  find all stems of all "adv" and erase all other markers"
> 
> how come you don't use the "-" as in the first example (+t%mor -t*
> +s"@r-*,|adv,o-%") ???
> 
> Thanks,
> Florence.
> 
> 
> -----Message d'origine-----
> De : chibolts at googlegroups.com [mailto:chibolts at googlegroups.com] De la part
> de Leonid Spektor
> Envoyé : samedi 19 juin 2010 16:07
> À : chibolts at googlegroups.com
> Objet : Re: Freq on French MOR tiers
> 
> Florence,
> 
> 	I think you are using an older version of CLAN. I just tried your
> command below with the latest version of CLAN on the data that has both
> v|...-... and v|...&... elements and I did not get any results with "&" in
> them.
> 
> 	But, I would recommend that you switch to a new way of searching for
> items on the %mor tier. New method is specifically designed for searching on
> %mor tier and provides a more precise match. You can type "freq +s@" in
> commands window to get more information and few example on this new feature.
> For example, your command below would look like this:
> 
> freq +s"@|-v*,r-*,o-%" +t*sbj *.cha
> 
> This command looks for items that have part of speech "v*", indicated by
> "|-v*", and any stem, indicated by "r-*". The "o-%" part instructs program
> to exclude all other parts of each item from output. "o-%" acts the same way
> as "-%%" and "&%%" in command below.
> 
> 	You will need to get the latest version of CLAN to try this new
> command.
> 
> Leonid.
> 
> 
> 
> 
> 
> On Jun 18, 2010, at 08:55, Florence Chenu wrote:
> 
>> Hi Leonid,
>> 
>> I would like to get a list of verbs in a series of files. I tried that
>> command:
>> 
>> freq +t%mor -t* +s"v*|*-%%"  +s"v*|*&%%" *.cha +t*sbj +u
>> 
>> 
>> In the result window, I get things like:
>> 
>> 67 v|taper
>> 1 v|taquiner
>> 7 v|tenir
>> 8 v|terminer
>> 5 v|tirer
>> 36 v|tomber
>> 1 v|tondre
>> 12 v|toucher
>> 
>> Which I totally expect
>> 
>> but I also get things like :
>> 
>> 1 v|voir&COND&3S
>> 2 v|voir&FUT&2S
>> 3 v|voir&IMPF&12S
>> 9 v|voir&IMPF&3S
>> 73 v|voir&INF
>> 34 v|voir&PRES&12S
>> 10 v|voir&PRES&3P
>> 31 v|voir&PRES&3S
>> 1 v|voir&SUBJV:PRES&3S
>> 
>> and I wonder why ????
>> 
>> Any tips ?
>> 
>> Thanks,
>> Florence.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
>> To post to this group, send email to chibolts at googlegroups.com.
>> To unsubscribe from this group, send email to
> chibolts+unsubscribe at googlegroups.com.
>> For more options, visit this group at
> http://groups.google.com/group/chibolts?hl=en.
>> 
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To post to this group, send email to chibolts at googlegroups.com.
> To unsubscribe from this group, send email to
> chibolts+unsubscribe at googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/chibolts?hl=en.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To post to this group, send email to chibolts at googlegroups.com.
> To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
> 
> 

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.



More information about the Chibolts mailing list