[Corpora-List] SCANNED TEXTS ARE VALID FOR CORPORA PURPOSES?

Angus Grieve-Smith grvsmth at panix.com
Fri Aug 1 15:43:11 UTC 2008


On Thu, 31 Jul 2008, J.L. DeLucca wrote:

> In the digital world there are the digital libraries like the " Gallica, 
> Bibliothèque nationale de France digital library "that works with 
> scanned texts NO OCR treatment or the Ebook projects that works wirh 
> full texts. well,I want to know if you would consider scanned texts NO 
> OCR treatment as digital corpora, especially oldest texts.

 	I think that the term "digital corpus" implies searchability and 
taggability.  In that sense, these texts are not digital corpora.  That 
said, they are certainly rich sources of texts that can often be OCRed 
into digital corpora.  I have used many of the Gallica texts in this way 
for my current project.

 					-Angus B. Grieve-Smith
 					Linguistics Department
 					University of New Mexico
 					grvsmth at unm.edu
 					grvsmth at panix.com
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list