Product: PDFTron SDK for python
Product Version: PDFNetPython3 9.3.0
Please describe your issue and provide steps to reproduce it:
Using the sample code provided for python I was testing pdf files for PDFA Conversion and Validation. The conversion was successful but the validation results contained the error below for some files and stated they werent valid PDFA files.
e_PDFA 1126: The number of nested q/Q operators is greater than 28.
** Objects: 49**
The Compliance level used was the same as in the sample code “PDFACompliance.e_Level2B”
Unfortunately I cant share the file that generated the error.
Could you please let me know what this means and possible causes for this? Are there any tools available in the pdftron sdk to help resolve this prior to converting the file to PDFA?
Hello, I’m Ron, an automated tech support bot
While you wait for one of our customer support representatives to get back to you, please check out some of these documentation pages:
Thank you for contacting us about this.
This error is related to the PDF content stream and the way that the graphics operations are written which, is limited by the PDF/A compliance level. We do not modify the document to resolve this limit during our PDF/A conversion because it may affect how the content is displayed.
If you have control of how this file is created, and if PDF/A conversion and validation is important for your workflow, perhaps you could modify so it is a bit simpler.
Thanks for getting back - the pdf is generated using pdftron’s converter which is fully licensed. Could you please let me know what specific changes are needed to the conversion process below to resolve this issue.
doc = PDFDoc()
doc.Save(outputPath , SDFDoc.e_compatibility)
Thank you for your response. The output file converted via our SDK is highly dependent on the input. In other words, if the input is complicated, the output PDF can also be complicated as a result.
In addition, as I mentioned before, this issue is related to graphics operators and how the content is displayed on the page. As such there might not be a way to get this file into compliance with the PDF/A level without potentially modifying how the content is displayed. Are you able to modify the input file on your end?
The input file is an email (.msg) file - I dont think I can modify it.
From what youve stated should I take it that this input file cannot be converted to a PDFA compliant pdf using pdftron?
For MSG files our
ToPDF API will first try to use MS Outlook, and then if that is not installed, will try the Windows
print verbs. So the biggest part of the complexity (e.g. the q/Q operators) is the complexity of the MSG file, and how this 3rd party application handles the MSG.
There may be something we could do on our end, but it would require the exact MSG in question, and knowing the 3rd party application.
Note that we do not garuantee that our PDFACompliance class can convert every PDF, as we avoid doing anything destructive, and we do not add dummy/useless data, which all defeat the purpose of PDF/A.
If getting 100% PDF/A conversion is important for you could convert to an image based PDF (no text, no vector, no interactive elements) and then do PDF/A conversion, which will always pass.
The other (better) option is to target PDF/A-3 which does not have this operator limit.
Thanks Ryan, those are very valuable insights.
The application doing the conversion is Outlook.
Would converting to an image then to pdfa cause it to lose the ability to select text?
I will discuss the PDF/A-3 option with the stakeholders and ask them if its ok. For now we are just flagging the file as pdfa compliant or not based on the validation result.
Yes, exactly, it is a destructive operation. Essentially the PDF just becomes a container for the images.