[Corpora-List] Boot Camp (Continued...)

Linas Vepstas linasvepstas at gmail.com
Mon Aug 18 23:54:10 UTC 2008


2008/8/18 Mark Davies <Mark_Davies at byu.edu>:

> At any rate, if you want to have access to a large, current corpus -- with complete and total and thoroughly satisfying full-text access -- then why not just create your own corpus, and then keep it updated?

Heh. we actually tried that, got a GSoC summer student who
we'd hoped would work with both the Wacky group, and with
the wikimedia/nutch/lucene to build a distributed web crawler
that would keep a corpus up-to-date (and as input fodder
to improve search, thus the connection to the search engine
folks) Unfortunately, he wasn't closely supervised and wandered
off in a less useful direction.

--linas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list