[Corpora-List] Tags in Word Smith

Mike Scott mike at lexically.net
Mon Feb 17 18:11:10 UTC 2003


Randall Jones wrote:
I have a question that I hesitate to ask because I'm sure the answer is
obvious.  I have a tagged German text.  I want to run WordList in  Word
Smith Tools in a way that the tags will differentiate homographs, e.g. sein
(verb and pronoun), da (adverb and conjunction), etc.  I would think that
because the words have different tags that they appear differently in the
list.  However, thus far I have been successful in ignoring the tags or
having them treated as separate words.  In both cases the different uses of
sein etc. are grouped together.

What am I doing wrong?

***********************

There should be an obvious solution but there isn't, I'm afraid
In WordSmith 3.0, a way to solve this problem is to ensure your tags can be
seen as part of the "word". As you will know, the apostrophe is by default,
for English, included in a word as an "acceptable mid-word character" so to
speak. If your text were tagged like this you'd get the results you want:

John'PROPERNOUN is'VERB on'PREP the'DET john'NOUN

You could also set another symbol as an acceptable mid-word character, say %
John%PROPERNOUN is%VERB on%PREP the%DET john&NOUN

(I haven't tested this but it *should* work. Test on a small text first,
then if OK, you could make a copy of your corpus and use Text Converter to
make the changes.)

In WS4 (emerging blinking into the daylight from a long dark tunnel) I will
think of a neater way than this of working! Am still refining tag treatment
so this query came at a good moment.



Mike Scott

Applied English Language Studies Unit
University of Liverpool
Liverpool L69 3BX, UK.

Mike.Scott at liv.ac.uk
http://www.lexically.net
http://www.liv.ac.uk/~ms2928



More information about the Corpora mailing list