batch conversion to pdf

Andrew Cunningham at GMAIL.COM
Wed Oct 27 01:35:54 UTC 2010


On 27 October 2010 11:20, Aidan Wilson <a.wilson at> wrote:
> Adobe, by contrast, have released .pdf as an open standard format, making it
> quite reliable for archive. To respond to your concerns about indexing and
> searchability, most pdf files (and pdf creation tools, printers, etc) encode
> character information in a text file layer. It's not perfect (try to
> copy/paste the text from a pdf and you'll quickly see why), but it will
> eventually improve to the point where merely by printing to pdf, it will
> encode a text only version as a sublayer, making it just as searchable as
> .doc.

Although for some complex scripts, from test results i've seen the
best way to get word content successfully into a PDF file is via
importing an XPS file into Acrobat, and even that isn't perfect.

Ultimately there are three factors:

1) what software or intermediate file format is used for the file
2) the font used (since the font itself can affect the results)
3) software used to generate the PDF

For some languages, there isn't a known combination of 1) - 3) that
will produce good results with current tools.

Hopefully things will improve in the future.


Andrew Cunningham
Senior Project Manager, Research and Development
State Library of Victoria

andrewc at at

More information about the Resource-network-linguistic-diversity mailing list