Tracking changes made during PDF/A conversion

Q: We have a need to capture PDF’s, make them PDF/a compliant and save
them as Fast Web View in v1.4. To that end, I have tested the PDFTron
software against many sample PDF’s our government agency deals with.

Most of the PDF’s I have tested PDFTron’s software on seem to come
identical as the original. But I know everyone will ask what changes
are being made and what types of documents are likely to give us
trouble. Most of our documents are scans of papers documents with
OCR. Fairly straight forward stuff. But these would be changes made
to official agency documents and even though we would only run the
PDFTron PDF/a software against non-compliant documents (which they
should not be submitting anyway), it would be great if we had some
idea of what to expect.


A: You can find the list of changes to the document by comparing the
list of compliance violations before and after the conversion. The
violations that do not appear in the final list mean that there was a
change to the document. PDFNet SDK (as well as PDF/A Manager) provide
a list of PDF object numbers which are modified during the conversion.
You can use a tool such as CosEdit (http://www.pdftron.com/pdfcosedit/
index.html) to inspect the objects that are modified in the
destination/source document.

Since you are dealing mainly with scanned document you are quite safe
and should not expect any significant loss of information. The most
vulnerable documents are files containing PDF features such as
transparency or JavaScript which are not allowed in PDF/A. To force
generation of a fully PDF/A compliant document a converter may strip
away this information and cause changes in the visual appearance of
the document. Using PDFNet SDK (http://www.pdftron.com/pdfnet) you can
detect if a document is using JavaScript or transparency and use this
information to avoid PDF/A conversion or flag the files for a visual
inspection, etc.