Is is possible to have completely automated, trouble free PDF/A - 1A conversion?

Ivanho · July 15, 2014, 6:02pm

Q:

We have problem with conversion of one PDF file (see attachment) to PDF/A conformance level 1a. Before conversion, PDFnet validator reports several problems. After conversion, all problems are gone, except one: e_PDFA381: The font dictionary is missing ‘ToUnicode’ entry. Please, could you tell me if it is bug in yours library? Or it is not possible to repair this kind of problems?

A:

This is intended behavior. The issue is that there is a font in the file (object # 13) which represents some symbols (EOJLIO+Wingdings3: trianglert, triangleleft) that are not mapped to Unicode. Since PDF /1A requires that all characters must be mapped to Unicode the program reports a warning/error. In general, it may not be possible to fully normalize files to PDF/A-1A without human intervention. You could use OCR, however this is also not error free (in particular for symbolic fonts that you are dealing with). Btw. the error will not be present if you convert to PDF/A-1B (or 2B) – because this subset does not require that all characters have a Unicode mapping.

Ivanho · July 16, 2014, 3:16pm

Do I understand well that in this case it is not possible to convert the PDF file to PDF/A-1A level automatically?
A: Correct. It is not possible to do convert generic PDF to Level A PDF/A completely automatically and error free because some documents are simply missing the semantic information required by Level A. Even OCR is prone to error (especially for symbolic fonts such as the one in the provided document). It is definitely possible to use tricks and massage PDF to a format where PDF/A Level A validators won’t complain, however this goes against the grain of Level A support (and you may be better off with Level B).

On Tuesday, July 15, 2014 11:02:44 AM UTC-7, Support wrote:

Q:

We have problem with conversion of one PDF file (see attachment) to PDF/A conformance level 1a. Before conversion, PDFnet validator reports several problems. After conversion, all problems are gone, except one: e_PDFA381: The font dictionary is missing ‘ToUnicode’ entry. Please, could you tell me if it is bug in yours library? Or it is not possible to repair this kind of problems?

A:

This is intended behavior. The issue is that there is a font in the file (object # 13) which represents some symbols (EOJLIO+Wingdings3: trianglert, triangleleft) that are not mapped to Unicode. Since PDF /1A requires that all characters must be mapped to Unicode the program reports a warning/error. In general, it may not be possible to fully normalize files to PDF/A-1A without human intervention. You could use OCR, however this is also not error free (in particular for symbolic fonts that you are dealing with). Btw. the error will not be present if you convert to PDF/A-1B (or 2B) – because this subset does not require that all characters have a Unicode mapping.