[Corpora-List] SCANNED TEXTS ARE VALID FOR CORPORA PURPOSES?
Angus Grieve-Smith
grvsmth at panix.com
Fri Aug 1 15:43:11 UTC 2008
On Thu, 31 Jul 2008, J.L. DeLucca wrote:
> In the digital world there are the digital libraries like the " Gallica,
> Bibliothèque nationale de France digital library "that works with
> scanned texts NO OCR treatment or the Ebook projects that works wirh
> full texts. well,I want to know if you would consider scanned texts NO
> OCR treatment as digital corpora, especially oldest texts.
I think that the term "digital corpus" implies searchability and
taggability. In that sense, these texts are not digital corpora. That
said, they are certainly rich sources of texts that can often be OCRed
into digital corpora. I have used many of the Gallica texts in this way
for my current project.
-Angus B. Grieve-Smith
Linguistics Department
University of New Mexico
grvsmth at unm.edu
grvsmth at panix.com
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list