8.1822, FYI: Software for Corpus Searches

The LINGUIST List linguist at linguistlist.org
Sun Dec 21 16:35:21 UTC 1997


LINGUIST List:  Vol-8-1822. Sun Dec 21 1997. ISSN: 1068-4875.

Subject: 8.1822, FYI: Software for Corpus Searches

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editor: Ljuba Veselinova <ljuba at linguistlist.org>

Assistant Editors:  Martin Jacobsen <marty at linguistlist.org>
                    Brett Churchill <brett at linguistlist.org>
                    Anita Huang <anita at linguistlist.org>
                    Julie Wilson <julie at linguistlist.org>
                    Elaine Halleck <elaine at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Anita Huang <anita at linguistlist.org>

=================================Directory=================================

1)
Date:  Wed, 17 Dec 1997 09:30:11 -0600 (CST)
From:  lhartman at siu.edu (Lee Hartman)
Subject:  Software for Corpus Searches

-------------------------------- Message 1 -------------------------------

Date:  Wed, 17 Dec 1997 09:30:11 -0600 (CST)
From:  lhartman at siu.edu (Lee Hartman)
Subject:  Software for Corpus Searches

Software for corpus searches

I'm announcing the release of a software program named
"Busca:  A Searcher for word patterns in texts" (Version 3 -- December 1997).

Busca is a DOS-based program that searches a set of text
files for a specified pattern of words or for a string of
characters.  When searching for a word pattern, Busca uses the
punctuation of the text to search sentence by sentence.  The
word pattern is defined in terms of a focus word, with
possibilities for specifying the first, second, and/or third
neighboring word before and/or after it, as well as a "floating"
word located anywhere in the sentence.  Words in the search
template can be defined in terms of their beginning (xxx-),
their ending (-xxx), a contained string (-xxx-), or their
entirety (xxx).  Each word position in the template may contain
up to ten alternative forms.

Busca can be directed to search a set of texts that are
contained in a large number of files, and these files may reside
in different DOS directories.

Busca was originally designed to be used with a corpus in
Spanish -- the Argentine and Chilean texts of the "Corpus de
Referencia de la Lengua Espan~ola Contemporanea" (CRLEC),
accessible at http://lola.lllf.uam.es -- but it can be used with
any set of ASCII text files that use conventional sentence
punctuation ("." and "?" and "!").  The program is available
both in English (busc3eng.zip) and in Spanish (busc3esp.zip).

Busca is intended for free, non-profit distribution.  Users
are requested to acknowledge Busca in publication of any
research that benefits from use of the program.

Here is the address from which to download Busca:

        http://www.siu.edu/~nmc/busca.html


- ------------------------------------------------------------------
Lee Hartman
Dept. of Foreign Languages
Southern Illinois University
Carbondale, IL 62901-4521
U.S.A.

---------------------------------------------------------------------------
LINGUIST List: Vol-8-1822



More information about the LINGUIST mailing list