15.981, Disc: Reply to review of Corpus Presenter, 15-681

Wed Mar 24 02:45:18 UTC 2004

LINGUIST List:  Vol-15-981. Tue Mar 23 2004. ISSN: 1068-4875.

Subject: 15.981, Disc: Reply to review of Corpus Presenter, 15-681

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Naomi Ogasawara <naomi at linguistlist.org>
 ==========================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================

1)
Date:  23 Mar 2004 20:35:40 -0000
From:  Raymond Hickey <r.hickey at uni-bonn.de>
Subject:  Reply to Stefan Gries's review of my book

-------------------------------- Message 1 -------------------------------

Date:  23 Mar 2004 20:35:40 -0000
From:  Raymond Hickey <r.hickey at uni-bonn.de>
Subject:  Reply to Stefan Gries's review of my book

Reply to review of Raymond Hickey Corpus Presenter (Amsterdam: John
Benjamins 2003) posted to the Linguist List on 24 February 2004
http://linguistlist.org/issues/15/15-681.html

It was drawn to my attention that a review of my book and software was
posted to the Linguist List by one Stefan Gries, a German academic working
at the University of Southern Denmark. I must confess I have not been
following all the reviews on the Linguist List which is why this one missed
my usually watchful internet eye.

I read the review carefully and thought about it for some days. Given
the negative and one-sided nature of the review I naturally
contemplated writing a rebuff. But quite obviously no one is
interested in an altercation between a reviewer and the author of a
book. However, because Gries's review is skewed, not to say on many
points downright wrong, I felt that it really is necessary to try and
put straight some of the more serious misrepresentations contained in
his review and - importantly - to mention some of the features of the
software which he does not discuss. I will keep my reply to an
absolute minimum and therefore not take him to task for each and every
inaccuracy in his review. I should add that I have never heard of
Stefan Gries before so there is no personal motivation in providing
this reply.

Gries concentrates on a small selection of features in his review. In
his discussion of tagging and of concordance building with large
corpora he provides sensible criticism which I have already reacted to
by improving the two programmes which come in for particular criticism
(see "Getting updates" below). But the main drawback of his review is
that he only mentions those aspects of the software which interest
him. The entire thrust of Corpus Presenter, the presentation of
corpora in intuitive tree form and the sophisticated retrieval
techniques for finding syntactic and morphological constructions in
such corpora, are not even mentioned by him.  A Corpus of Irish
English which is supplied with the book and which can be used as
instructional software for teaching about corpus linguistics, for
example, is not discussed by Gries at all. Gries also complains about
the structure of the book: this was discussed with the publishers and
vetted by their reviewers. If he doesn't like it I can't help it but
he can rest assured that there were good reasons for presenting it the
way it is.

Gries compares Corpus Presenter with other corpus processing software
and finds it wanting. My software is not an attempt to replicate
WordSmith or any other software tool which is already
available. Instead it offers a suite differently structured with
different aims and organisation. I am not interested in reinventing
the wheel and if Gries prefers WordSmith then he should stick to
it. Many of the retrieval options, particularly for syntactic frames,
are only available in Corpus Presenter and the retrieval options show
a flexibility which is not found in other software. It is true that
for huge corpora, Corpus Presenter is slower than WordSmith (though I
doubt Gries statistics). More flexibility does have a price, but speed
is not everything, functionality and range of options are equally
important and anyway with the increasing speed of computers, this
question is not the burning issue it was years ago.

I take grave exception to Gries talking of "bugs" in his final
remarks.  Corpus Presenter works properly and fulfils the functions
which it claims to perform (Gries acknowledges this, if only
begrudgingly). I have over 20 years of experience in computing and am
a committed programmer who spent several years and had many, many
discussions with colleagues around the world in the preparation of
this software. Gries does not appear to like the interface of the
programmes (there is no accounting for taste), but it is quite
unjustified of him to go on and on for several pages complaining about
the features the software suite contains. He seems to find too many:
all the utilities supplied in the CD are the result of suggestions by
colleagues in the field who felt the need for the functions they
fulfil.  Now if Stefan Gries does not like these utilities he can just
ignore them.  It is stated explicitly that they are not necessary for
the functioning of the main programme and he can delete them if he
wishes. Furthermore, there are clear instructions in the main
programme about how to remove all the software and auxiliary
files. His frenzy of petty-mindedness in discussing the installation
procedure is patently absurd. I might add that the setup is determined
by Windows and not by me. If he does not like it, then I suggest he
directs his comments to Microsoft and see if he can find anyone to
listen to him.

There are a number of factual misrepresentations in Gries's review. To
keep things short I will only mention two serious ones. (1) It
certainly is possible to sort concordance returns on words to the left
and right of the keyword (this can be done for up to 8 words each side
of the keyword).  Gries does not like my terminology - "restructure
return lines" - but as a native speaker of English I beg to maintain
that this is an acceptable description of this function. (2) Gries
thinks that the analysis of style is not treated in Corpus Presenter,
but the special text editor, CP Text Tool, has a function for Lexical
Clustering analysis which does precisely that. It will allow users to
determine the occurrence of stylistic features in a flexible manner
and so help them answer such questions as text authorship. Lexical
Clustering is mentioned on several occasions, including the various
guides available within the Launcher so Gries should have seen this is
if he had looked at the material properly.

Getting updates:

The updates for the Corpus Presenter software can be downloaded at my
homepage "www.uni-essen.de/~lan300/HICKEY.htm". Here you click on
"Computer Projects" and then on "Corpus Presenter" to get to the page
you need (this cannot be done directly unfortunately). Then click on
"Updates" and download what the files you require. There is an update
for CP Flash Processor (greatly increased speed of directory listings)
and CP Text Tool (better interface for tagging texts) as well as a
text on tagging and a list processing utility.

Raymond Hickey
Tuesday, 23 March 2004

---------------------------------------------------------------------------
LINGUIST List: Vol-15-981