[Corpora-List] A few questions concerning WordSmith 4.0

Mon Nov 27 19:27:13 UTC 2006

Dear all,

I have some questions concerning work with Concord in WordSmith 4.0 
(excuse my incompetence or in case I overlooked an apparent mistake).

In the 3.0 version I used the Collocation function to look for words 
with a particular suffix or ending (in case it covers more or less than 
a real morpheme). For this purpose I used a truncated search, e.g. 
"*ism" and the set the collocation horizon to 0/0. Strictly speaking, 
the programme then did not really calculate collocations as words 
appearing to the left or the right of the search string, but just 
produced a list of the centre words. As this, however, covered all 
-ism-words ("intellectualism", "capitalism", "occultism", etc.), this 
was exactly what I wanted.

Now in the 4.0 version, I can no longer choose a zero horizon - neither 
to the left nor to the right. This problem can, however, be solved by 
clicking on the centre column in the Collocations, which orders the 
words according to the frequencies at which they appear as the central 
word, which gives me the same results as the procedure just described 
for 3.0. The problem that I have is that the Collocation function does 
not give me the full version of the central word, but just the first two 
letters (e.g. "in", "ca", "oc"). If there are not that many, I may be 
able to guess the word, but in other cases this is impossible. As I am 
not the most intelligent and sophisticated user of WordSmith, I doubt 
that this problem is due to my challenging demands, but rather a result 
of me missing some setting options or something similar. But I seem to 
be unable to detect what I am missing.

A second, probably similarly simple problem is that I seem to be unable 
to exclude words if working with a truncated search word. E.g. if 
looking for synthetic comparatives in English, using "*er" as my target, 
I would like the programme to ignore obvious high-frequency words such 
as "ever", "never", "her", "after" etc. This was possible with WordSmith 
3.0, but I cannot find the equivalent function in the 4.0 version.

The third problem concerns the use of search files. Examining corpora 
concentrating on particular discourses (e.g. women's magazines, wellness 
brochures, popular books promoting lifestyle changes, etc.), I have 
started to use files comprising more exhaustive lists of particular 
lexical fields, e.g. nutrients, social relations, diseases, etc. This 
allows me to compare the extent to which a specific discourse focuses, 
for instance, on nutritional aspects of food or takes a rather 
pathological view of life (at least on a superficial level). Now I have 
put together a heavy list of pathological terminology composed of 
internet resources and some initial searches. This covers some 4,000 
expressions. I was not really surprised that WordSmith could not finish 
checking the occurrence of these expressions in a 600,000 word corpus, 
considering that I do not have an ultrafast computer. I was just 
wondering whether there is any limit to a search file (say 500 lines or 
something like that) with which you can successfully perform such 
searches even with a moderately fast computer.

Any help would be highly appreciated :-)

Georg

-- 
*******************************************************************************
* Mag. Dr. Georg Marko, M.A., Vertragsassistent
* Institut fuer Anglistik (Department of English Studies)
* Karl-Franzens-Universitaet Graz
* Heinrichstrasse 36, A-8010 Graz
* tel.: +43/316/380-2474
* e-mail: georg.marko at kfunigraz.ac.at
*******************************************************************************

                                 "I drew a treasure map on your hand"
                                                            Ani diFranco

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20061127/db20ecca/attachment.htm>