[Corpora-List] "Multi-encoded" corpora

Martin Wynne martin.wynne at oucs.ox.ac.uk
Wed Oct 8 11:44:27 UTC 2008


Albretch Mueller wrote:
> ~
>  I was browsing around the BAWE corpus info previously posted here and
> when I noticed all texts are in PDF format (!), it made me wonder...

Oh no, they're not! The corpus is composed text files, with a choice of 
text encodings. None of it is in PDF files. There is some prose 
documentation in PDF files to accompany the corpus in the package of 
files which can be downloaded from the OTA.

Martin

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list