<p class="p1"><span class="s1">Dear Mr. MacWhinney,</span></p>
<p class="p2">Thank you so much for your quick response and the information. What I was trying to do is to make lists of words based on the Part of Speech of the texts, to do collocation study. For example, to extract all the adjectives and adverbs from two text corpuses (say C1, and C2), and then matching the adj+adv collocation phrases in other text corpus (C3), and to measure the occurring frequency; or vice versa, to extract adj+adv collocation phrases in C3 and see how frequent these adj and adv words occur in C1 and C2.<br><span class="s1"></span></p>
<p class="p2">I use CLAN to mor the text corpus, and then I use a regular expression software to match the words in the mor lines. For example, </p>
<p class="p1"><span class="s1">I use the following two regular expressions to match and extract all the adjective words in a mor taggaed corpus:</span></p>
<p class="p1"><span class="s1">adj\:\w+\|[\w\&\-]+ </span></p>
<p class="p1"><span class="s1">and</span></p>
<p class="p1"><span class="s1">adj\|[\w\&\-]+ </span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">the result of extraction looks like this:</span></p>
<p class="p1"><span class="s1">adj|thirst&dn-Y</span></p>
<p class="p1"><span class="s1">adj|sun&dn-Y</span></p>
<p class="p1"><span class="s1">adj|snow&dn-Y </span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">adj:part|marry-PERF</span></p>
<p class="p1"><span class="s1">adj:part|mean-PROG</span></p>
<p class="p1"><span class="s1">adj:part|tire-PERF</span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">And then, I use regular expression again to remove the head 'adj\|' and 'adj\:\w+' at once: </span></p>
<p class="p1"><span class="s1"><br></span></p><p class="p1"><span class="s1">thirst&dn-Y</span></p>
<p class="p1"><span class="s1">sun&dn-Y</span></p>
<p class="p1"><span class="s1">snow&dn-Y </span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">marry-PERF</span></p>
<p class="p1"><span class="s1">mean-PROG</span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">And then I restore them manually as the following: </span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">Thirsty</span></p>
<p class="p1"><span class="s1">Sunny</span></p>
<p class="p1"><span class="s1">Snowy</span></p>
<p class="p1"><span class="s1">Married</span></p>
<p class="p1"><span class="s1">Meaning</span></p>
<p class="p2"><span class="s1"></span><br></p>
<p class="p1"><span class="s1">The problem is that when the list becomes large, it is almost impossible to check them by hand. That's why I was wondering if there is any better way of doing this, i.e. to restore the word form, or to extract words or phrases by the part of speech of the text? </span></p>Many thanks.<br><br>Brian MacWhinney於 2012年12月20日星期四UTC+8上午6時15分04秒寫道:<blockquote class="gmail_quote" style="margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div style="word-wrap:break-word">Dear YuJen,<div> MOR is designed to change the forms on the main line to analyzed forms on the %mor line. CLAN is not really designed to give you access to both the main line form and the %mor line form at the same time. It is not clear from your message exactly when and how you need to have access to the main line forms. There are some limited abilities to retrieve corresponding forms that can work in COMBO or KWAL, but the best way to approach this systematically would be through a search of the XML version of the corpus where the %mor forms and the main line forms are together in a single bundle. You can create XML versions of CHAT corpora using the Chatter program. Then you would use your regular expression matching software to process the XML.</div><div><br></div><div>--Brian MacWhinney</div><div><br><div><div>On Dec 19, 2012, at 11:46 AM, "Huang, YuJen" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="Y92gpPESo-sJ">ihappyl...@gmail.com</a>> wrote:</div><br><blockquote type="cite">Dear all,<div><br>I have question about how to restore the word form from a mor tagged corpus. </div><div>I was making word lists according to part of speech. I used mor to tag a text corpus, and then I use a regular expression software to extract the words by matching the part of speech tags created by mor. The word list extracted seems fine, but I found that some of the lexical forms have changed by mor in the %mor tier. For example, the plural form of a noun, the tenses of a verb... </div><div><br></div><div>I was wondering is there anyway that I can restore them to the original form in CLAN? Or other efficient methods without needing to check and modify them one by one.<br><br>Thank you.<br>Huang, YuJen</div><div><br></div>
-- <br>
You received this message because you are subscribed to the Google Groups "chibolts" group.<br>
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Y92gpPESo-sJ">chib...@googlegroups.com</a>.<br>
To unsubscribe from this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Y92gpPESo-sJ">chibolts+u...@<wbr>googlegroups.com</a>.<br>
To view this discussion on the web visit <a href="https://groups.google.com/d/msg/chibolts/-/D2_Rh6VlEDcJ" target="_blank">https://groups.google.com/d/<wbr>msg/chibolts/-/D2_Rh6VlEDcJ</a>.<br>
For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank">https://groups.google.com/<wbr>groups/opt_out</a>.<br>
<br>
<br>
</blockquote></div><br></div></div></blockquote>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups "chibolts" group.<br />
To post to this group, send email to chibolts@googlegroups.com.<br />
To unsubscribe from this group, send email to chibolts+unsubscribe@googlegroups.com.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msg/chibolts/-/zSAqejP0OkgJ">https://groups.google.com/d/msg/chibolts/-/zSAqejP0OkgJ</a>.<br />
For more options, visit <a href="https://groups.google.com/groups/opt_out">https://groups.google.com/groups/opt_out</a>.<br />
<br />
<br />