Corpora: KWiCFinder, a free Web Concordancer

William H. Fletcher fletcher at usna.edu
Mon Apr 23 16:46:06 UTC 2001


For several years I have been developing KWiCFinder, a PC-based concordancer
for the Web which conducts a user's search and produces a KWiC concordance
of the search terms. This program was conceived by a linguist for linguists,
but it is a powerful research tool for any field, whether one is interested
in form or content. A stable but incomplete (especially the documentation!)
preliminary release of this free program is now available for download at

   http://miniappolis.com/KWiCFinder/

I would appreciate feedback and suggestions from colleagues in the corpus
community on usefulness and potential improvements to the program.

KWiCFinder uses the AltaVista search engine. It helps the user formulate a
query, then downloads documents matching the query and displays Key Word in
Context excerpts in a variety of formats and languages. Downloaded documents
can be saved in HTML and/or text formats, so they'll still be there when you
need them.

KWiCFinder also offers refinements to narrow the search even further than
AltaVista's complex Boolean criteria normally allow. It introduces wildcards
to match a single character (versus AV's *, which matches 0-5 characters),
and the "sic" option to block lower-case or "plain" characters from matching
upper-case or accented ones (without the "sic" option, German "wurde" also
matches "würde", or Spanish "continuo" matches "continúo" and "continuó" as
well).

KWiCFinder's "Tamecards" provide a shortcut method of specifying variants
without matching as many undesired forms as wildcards would, e.g.
   sink[,s,ing]
expands to
   sink sinks sinking
and
   s[iau]nk[,s,ing]
expands to all possible forms of the verb to sink (as well as to the
nonsense forms such as sanks and sunking). Similarly, "on-line", with the
implicit tamecard "-",  also matches "online" and "on line".

KWiCFinder distinguishes between "search terms," which appear in the report,
and "selection criteria," which narrow the search but are not reported on.

Once a search has been launched, KWiCFinder works in the background, without
user intervention. It can download and analyze a virtually unlimited number
of documents sequentially at the rate of 5-20 documents per minute. By
launching additional instances of the program,  one can conduct a number of
searches simultaneously.

Search reports are encoded in XML and transformed to HTML for display.
Consequently the language and format options can be changed after the
search, and the end-user can even modify and extend them by editing the XSLT
stylesheets. The XML-based approach also permits documents and citations to
be annotated, categorized or deleted, and reports from different searches
can be merged.

KWiCFinder is still under development.  Your observations and suggestions
will be received with enthusiasm!

Bill Fletcher

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  William H. Fletcher                         (410) 293-6362 [voice]
  Associate Professor of German and Spanish   (410) 293-2729 [fax]
  Language Studies Department                 (DSN 281-xxxx)
  US Naval Academy
  589 McNair Road
  Annapolis, MD 21402 - 5030

  fletcher at usna.edu
  http://www.usna.edu/LangStudy/

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



More information about the Corpora mailing list