Find whitespace in PDF

Product: All
Product Version: All

Please give a brief summary of your issue: Finding Whitespace in a PDF

Please describe your issue and provide steps to reproduce it: Does PDFTron have the ability to detect whitespace in a PDF to place a stamp? I am familiar with the process of converting a PDF to an image and then binary text, but is it built in to PDFTron?

Please provide a link to a minimal sample where the issue is reproducible: N/A

Are you looking for a general solution, that would work with any PDF file? Or just a solution that works with your specific PDF files?

Note, by default, if nothing is drawn to a PDF page, then the entire page could be considered transparent. PDF viewers though, by default, make the page white, and some viewers allow changing the page color.

This makes sense if you consider PDF viewing as print pre-viewing, and that that graphics commands are commands to a physical printer. Therefore, a blank PDF page, when printed, would just be the color of the actual physical paper, which the PDF file and PDF viewer have on idea will be.

The issue then is that some PDF files actually do define the background color, and this color could be white.

So, no PDFTron does not have an automatic way to detect “whitespace”.

The simplest way is to use our ElementReader sample, and track all the bounding boxes of any element. Compare that the Page’s CropBox and you can find areas that definitely have no graphics drawn in the area.

But if you want to also exclude white color, such as a white images, or path/rectangle filled white, than that is more complicated. Note that PDF supports many color spaces, such as CMYK and Spot colors, so it can be unclear what is even “white”.

Therefore the best general solution would be to rasterize the page to an image, and do an image analysis. See this forum post on how to translate between PDF and image coordinates.