Corpora: Re: HTML Concordancing

Andrew Kehoe andrew at rdues.liv.ac.uk
Tue May 9 15:28:29 UTC 2000


Maritza,

It seems that most of the technology you require is already implicit in our
prototype WebCorp web concordancing software. We have modified the existing tool
to produce word (frequency) lists for web pages. A demonstrator can be found
at http://webcorp.connect.org.uk/wordlist.html, which will construct word lists
for an individual target page.

Regards,

Mike Pacey,
R&D Unit for English Studies,
University of Liverpool


> From owner-corpora at lists.uib.no Tue May  9 12:10 BST 2000
> From: "Maritza vd Heuvel" <MVDH at AKAD.SUN.AC.ZA>
> To: Corpora at hd.uib.no
> Date: Tue, 9 May 2000 12:46:32 +0200
> MIME-Version: 1.0
> Content-transfer-encoding: 7BIT
> Subject: Corpora: Html Concordancing?
>
> Hi
>
> Let me start off by introducing myself. I'm a postgrad researcher
> working on a lexicon for the speech recogntion component of a
> spoken dialogue system. The electronic material available for use
> in corpora and for concordancing purposes is very limited and one
> of our options is using web sites containing relevant information to
> generate word lists. Does anyone know of a concordancing tool
> that allows concordancing of files that contain html tags without
> first requiring conversion of the html into a text format?
>
> Thanks!
> Maritza van den Heuvel



More information about the Corpora mailing list