[Corpora-List] qu: collecting and automatically classifying data from the web

William Fletcher fletcher at usna.edu
Tue Nov 15 19:06:39 UTC 2005


Florian,

My free KWiCFinder application (Windows)
http://kwicfinder.com 
does support date ranges and permits restriction of searches to specific websites.  On the other hand, it requires searching for specific words or phrases, and is hampered by the changes to the AltaVista search engine (no wildcards , inconsistent support for stopwords as well as capitals and diacritics.  Webpages downloaded can be saved automatically in either text or HTML format for further analysis.

For further details see my paper
"Concordancing the Web: Promise and Problems, Tools and Techniques"  
http://www.kwicfinder.com/FletcherConcordancingWeb2005.pdf 

Good luck,
Bill Fletcher


>>> "T. Florian Jaeger" <tiflo at csli.stanford.edu> 11/15/05 12:45 PM >>>
Hello,

I am forwarding this for a friend who wants to collect data from
specific web sites and automatically organize it according to the data
of the website. Are you aware of any such tool? I remember there was a
KWIC like search interface for the web, but I can't remember it's name
and I also don't know whether it allows you specify date ranges for
the search.

thanks for your help,

florian

--
T. Florian Jaeger
Ph.D. student
Linguistics Department,
P: +1 (650) 725 2323
F: +1 (650) 723 5666
U: http://www.stanford.edu/~tiflo/ 



More information about the Corpora mailing list