[Corpora-List] Tags in Word Smith

Lee, David dvdlee at umich.edu
Mon Feb 17 19:15:41 UTC 2003


If your POS tags are separated from the word by the underscore character (which makes things more readable than by using "%"), you can simply add the underscore character as an acceptable part of a 'word' by going to:

Settings > Adjust Settings. Go to the "Text" tab and add "_" alongside the apostrophe which is already there.

An important *second* step *if your POS tag set includes numbers (e.g. NN1, NN2)* is to then go to the "WordList" tab and activate the checkbox for "numbers included". Otherwise you will find yourself generating a wordlist without any singular (NN1) or plural nouns (NN2)...    You may also want to increase the "word length" setting at the same time, since all 'words' are now longer than before, because of the included tag.



Dave.

___________________________________________________
David YW Lee
dvdlee at umich.edu
Research Fellow, MICASE project
English Language Institute, University of Michigan
TCF Building, 401 E. Liberty, Suite 350, Rm 3140
Ann Arbor, Michigan 48104-2298, USA. Tel: +1 734-615-9638 (O)

MICASE web site: http://www.lsa.umich.edu/eli/micase/micase.htm
Corpus-based Linguistics web site: http://devoted.to/corpora
___________________________________________________


> -----Original Message-----
> From: Randall Jones [mailto:randall_jones at byu.edu]
> Sent: Mon, February 17, 2003 12:37 PM
> To: CORPORA
> Subject: [Corpora-List] Tags in Word Smith
> 
> 
> I have a question that I hesitate to ask because I'm sure the 
> answer is
> obvious.  I have a tagged German text.  I want to run 
> WordList in  Word
> Smith Tools in a way that the tags will differentiate 
> homographs, e.g. sein
> (verb and pronoun), da (adverb and conjunction), etc.  I 
> would think that
> because the words have different tags that they appear 
> differently in the
> list.  However, thus far I have been successful in ignoring 
> the tags or
> having them treated as separate words.  In both cases the 
> different uses of
> sein etc. are grouped together.
> 
> What am I doing wrong?
> 
> 
> Randall L. Jones
> Department of Germanic & Slavic Languages
> Brigham Young University
> Provo, Utah 84604  USA
> randall_jones at byu.edu
> http://humanities.byu.edu/faculty/JonesR.html
> 
> 
> 
> 
> 



More information about the Corpora mailing list