[Corpora-List] qu: collecting and automatically classifying data from the web
William Fletcher
fletcher at usna.edu
Tue Nov 15 19:06:39 UTC 2005
Florian,
My free KWiCFinder application (Windows)
http://kwicfinder.com
does support date ranges and permits restriction of searches to specific websites. On the other hand, it requires searching for specific words or phrases, and is hampered by the changes to the AltaVista search engine (no wildcards , inconsistent support for stopwords as well as capitals and diacritics. Webpages downloaded can be saved automatically in either text or HTML format for further analysis.
For further details see my paper
"Concordancing the Web: Promise and Problems, Tools and Techniques"
http://www.kwicfinder.com/FletcherConcordancingWeb2005.pdf
Good luck,
Bill Fletcher
>>> "T. Florian Jaeger" <tiflo at csli.stanford.edu> 11/15/05 12:45 PM >>>
Hello,
I am forwarding this for a friend who wants to collect data from
specific web sites and automatically organize it according to the data
of the website. Are you aware of any such tool? I remember there was a
KWIC like search interface for the web, but I can't remember it's name
and I also don't know whether it allows you specify date ranges for
the search.
thanks for your help,
florian
--
T. Florian Jaeger
Ph.D. student
Linguistics Department,
P: +1 (650) 725 2323
F: +1 (650) 723 5666
U: http://www.stanford.edu/~tiflo/
More information about the Corpora
mailing list