Lemmatized types and tokens for specific tags?

LaTreese Hall lhall046 at fiu.edu
Fri Apr 30 19:27:52 UTC 2021


Hello Leonid,

Thank you for responding! I had done all of this but it was still counting singular and plural form of the same word as two different types. 

This is what the output looks like when I run freq +d7 +s*@z:* +sm;*,o% 

   1 circle    
      1 circle at z:shp
  1 circles   
      1 circles at z:shp
 1 square    
      1 square at z:shp
  1 squares   
      1 squares at z:shp
------------------------------
   4  Total number of different item types used
   4  Total number of items (tokens)
1.000  Type/Token ratio

Any idea of how to get Type and Token info for the lemmas? So that the above output indicates 2 types and 2 tokens?

Thank you!
LaTreese


> On Apr 30, 2021, at 2:08 PM, Leonid Spektor <spektor at andrew.cmu.edu> wrote:
> 
> Hi,
> 
> 	The plain command freq +s*@z:* will not work, because words square and squares have different spelling. If you want both of those words counted as one type, then you need to run lemmas, i.e. stems only, search on %mor tier. To create %mor tier in your data files you need to get appropriate language grammar from the web first. It looks like you are working with English language data files, so to get the English grammar you need to start CLAN and to select menu "File->Get MOR Grammar->English - eng". This will download the grammar to your computer. After that you need to run MOR command on your data files. In "Commands" window type command mor *.cha. This assumes your data filenames end with .cha file extension. When MOR command is finished and doesn't find any words that it can not identify, then you can use the following command to find what you want:
> 
> freq +d7 +s*@z:* +sm;*,o%
> 
> 
> Leonid.
> 
>> On Apr 30, 2021, at 12:12, LaTreese <lhall046 at fiu.edu <mailto:lhall046 at fiu.edu>> wrote:
>> 
>> Hello there!
>> 
>> This may be an easy issue to solve, but I cannot figure it out. I have relatively little experience with CLAN so please be gentle.
>> 
>> I have several different tags in my transcripts (e.g., @z:shp to denote shape words). So, in the transcript, "square" and "squares" would be coded as square at z:shp and squares at z:shp , respectively. However when I try to analyze them for types and tokens, different forms of the same stem are being counted as two different types (e.g., square and squares counted as 2 types). 
>> 
>> I am currently using this command to get types and tokens of all of my different categories:
>> freq +s*@z:* 
>> 
>> I read in the manual that I should create a MOR line to be able to run types and tokens on lemmas so that "square" and "squares" are counted as one type. I did this, but now my tags are not available on the MOR line.
>> 
>> Is there a way to get the lemmatized type and token counts for specific tags?
>> 
>> Thank you so much!
>> 
>> LaTreese Hall
>> Florida International University
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/628a167c-ada8-4df7-8c91-19b0933f0743n%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/628a167c-ada8-4df7-8c91-19b0933f0743n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google Groups "chibolts" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/chibolts/2lAUOxAbURA/unsubscribe <https://groups.google.com/d/topic/chibolts/2lAUOxAbURA/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/958FB752-CD6D-4128-B9F9-2BB9C35E43C3%40andrew.cmu.edu <https://groups.google.com/d/msgid/chibolts/958FB752-CD6D-4128-B9F9-2BB9C35E43C3%40andrew.cmu.edu?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/6163D3AB-5688-439C-BC19-D6A1FB9302F9%40fiu.edu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20210430/721a1fa2/attachment.htm>


More information about the Chibolts mailing list