How to Extract Images from PDF and preserve transparency?

Aaron_Gravesdale · June 29, 2011, 1:47am

Q: We are currently trying out the demo version of the pdfnet sdk for
consideration for a purchase. I'd like to know how to extract png's
and tiffs from a pdf and preserve the transparency. I've tried the
sample project but the generated images are not preserving
transparency.
--------------------

A: As a starting point you may want to take a look at ImageExtract
sample project:
http://www.pdftron.com/pdfnet/samplecode.html#ImageExtract

Please keep in mind that PDFs do not contain embedded PNG or TIFF
images.

PDF images can be associated with a Soft mask (that may not be of same
dimensions as the base image; in other cases the soft mask may be
vector based or it could be derived from a PDF 'page'). In PDF a soft
(or image) mask is used to compute the alpha value of a bitmap.

You can check if an image has a soft mask using image.GetSoftMask()
(or image.GetImageMask() to check for image mask). For example:

If (image.GetSoftMask() != null) {
Image soft_mask = new Image(image.GetSoftMask());
soft_mask.Export(...);
}

If (image.GetMask() != null) {
Image bin_mask = new Image(image.GetMask());
bin_mask.Export(...);
}

Unfortunately there may be also a soft mask in the Graphics State so
the alpha value may be influenced by content that is outside of domain
of the XObject.

One way you could extract images exactly as they are shown in PDF is
to use PDFDraw to rasterize PDF pages, but set page.SetCropBox(bbox)
before rendering - based on the bbox of a given image element
(element.GetBBox())