batch conversion to pdf
Jane Simpson
jane.simpson at SYDNEY.EDU.AU
Wed Oct 27 21:23:46 UTC 2010
idealy you want two things for archiving - text because it is so much more editable and searchable, and something like pdf to give an idea of what it should look like
-----Original Message-----
From: Ken Manson [mailto:ken.grammar at gmail.com]
Sent: Wed 27/10/2010 8:19 PM
To: 'Rik'; 'Aidan Wilson'; 'Andrew Cunningham'
Cc: 'Gary Holton'; 'resource-network-linguistic-diversity'
Subject: RE: batch conversion to pdf
Hi All,
I like ODT format - even though I use Word 2007. (Have some macros I am in
love with for aligning examples.) I have used ODT to pass on documents for
editing which have Thai and Burmese (Unicode) fonts.
However, for archiving, pdf is "less" editable and preferable.
Ken
-----Original Message-----
From: Rik [mailto:rdbusser at gmail.com]
Sent: Wednesday, 27 October 2010 9:18 AM
To: 'Aidan Wilson'; 'Andrew Cunningham'
Cc: 'Gary Holton'; 'resource-network-linguistic-diversity'
Subject: RE: batch conversion to pdf
What about the ODT format that OpenOffice uses? It is part of the Open
Document Standard (open and ISO-compliant) and it is transparent (basically
a zip-container with XML files in it). Not really sure how well it deals
with complex scripts, though.
Rik
------------------------------
Rik De Busser
Research Centre for Linguistic Typology
La Trobe University, Bundoora 3086 VIC, Australia
www.rdbusser.com
-----Original Message-----
From: Aidan Wilson [mailto:a.wilson at pgrad.unimelb.edu.au]
Sent: 27 October 2010 11:20
To: Andrew Cunningham
Cc: Gary Holton; resource-network-linguistic-diversity
Subject: Re: batch conversion to pdf
All true. But it's better than archiving in .doc format. Take the current
situation with .docx as an example; Microsoft no longer support their own
propriatary formats (.doc, .ppt, .xls, .mdb, etc) and to read them in the
newest Office suite, you must download the 'compatibility pack'. The reason
is
of course that other software engineers and manufacturers like Sun
Microsystems
have reverse engineered these formats and make software that can read and
write
to them easily. So Microsoft, understandably, is oriented towards control of
their formats - an aim that is largely incompatible with those of the
archivist.
Adobe, by contrast, have released .pdf as an open standard format, making it
quite reliable for archive. To respond to your concerns about indexing and
searchability, most pdf files (and pdf creation tools, printers, etc) encode
character information in a text file layer. It's not perfect (try to
copy/paste
the text from a pdf and you'll quickly see why), but it will eventually
improve
to the point where merely by printing to pdf, it will encode a text only
version as a sublayer, making it just as searchable as .doc.
Alternatively, you could copy/paste the contents out of a word doc and
archive
as a raw text file (in addition to pdf). It'd consume negligibly little
storage
space.
--
Aidan Wilson
PhD Candidate
Dept of Linguistics and Applied Linguistics
The University of Melbourne
+61428 458 969
a.wilson at pgrad.unimelb.edu.au
On Wed, 27 Oct 2010, Andrew Cunningham wrote:
> I'm just wondering if PDF files are suitable as an archival format,
> since it is in essence a preprint format rather than an archival
> format
>
> This may be more of a concern with languages written in complex
> scripts (including Latin and Cyrillic script languages that need to be
> treated as complex scripts), where a PDF document will be
> glyph-centric rather than character-centric; affecting searchability,
> indexing and text extraction.
>
> Andrew
>
> On 27 October 2010 04:36, Gary Holton <gmholton at alaska.edu> wrote:
>> Here at ANLA we are often faced with the problem of archiving vast
>> numbers of digital files in proprietary formats, especially MS Word.
>> Does anyone know of a good method for batch converting from, say, .doc
>> to .pdf ?
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20101028/af94f0bf/attachment.htm>
More information about the Resource-network-linguistic-diversity
mailing list