[Corpora-List] Boot Camp (Continued...)
Linas Vepstas
linasvepstas at gmail.com
Mon Aug 18 23:54:10 UTC 2008
2008/8/18 Mark Davies <Mark_Davies at byu.edu>:
> At any rate, if you want to have access to a large, current corpus -- with complete and total and thoroughly satisfying full-text access -- then why not just create your own corpus, and then keep it updated?
Heh. we actually tried that, got a GSoC summer student who
we'd hoped would work with both the Wacky group, and with
the wikimedia/nutch/lucene to build a distributed web crawler
that would keep a corpus up-to-date (and as input fodder
to improve search, thus the connection to the search engine
folks) Unfortunately, he wasn't closely supervised and wandered
off in a less useful direction.
--linas
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list