PDF Image Export

Aaron_Gravesdale · January 4, 2011, 7:56pm

Q: We are using PDFTron PDFNet API to extract images from PDF. When we
extract images using PDFTron, the image resolution is not the same as
when we use Illustrator. Also, transparencies are not fully maintained
during the export. Is there something wrong with our code?

Function ProcessImages(ByVal page As Page, ByRef dPage As dsPage) As
String
        Dim reader As ElementReader = New ElementReader
        reader.Begin(page)
        Dim element As Element = reader.Next()
        Dim xml As String = ""

While Not IsNothing(element)
If element.GetType() = element.Type.e_image Or
element.GetType() = element.Type.e_inline_image Then

                Dim ctm As Matrix2D = element.GetCTM()
                Dim x2 As Double = 1
                Dim y2 As Double = 1
                ctm.Mult(x2, y2)

If element.GetType() = element.Type.e_image Then

Dim dImage As New dsImage()
dImage.ImageGUID = Guid.NewGuid()

                    Dim fname As String = output_path +
"image_extract1_" + imageCounter.ToString() + "_" +
dImage.ImageGUID.ToString() + ".png"
                    Dim image As pdftron.PDF.Image = New
pdftron.PDF.Image(element.GetXObject())
                    Dim bmp As System.Drawing.Bitmap =
image.GetBitmap()
                    bmp.MakeTransparent()
                    bmp.Save(fname)

                End If
            End If
            element = reader.Next()
        End While

Return ""
End Function
----------------------

A: The PDFNet function you used extracts the image without any changes
to its native resolution (so any resolution diffrence is due to
Illustrator settings).

In terms of transparency, in PDF an image can be associated with
another image (or even vector artwork) as a soft (i.e. alpha)
channel.

You can use the API to extract all information required to recreate a
soft channel for the embedded image, however it is fairly involved.

Do you need to extract the embedded images as they are stored/embedded
in PDF or you would be fine with a rendered PDF page? In the latter
case you could use ‘pdftron.PDF.PDFDraw’ (as shown in PDFDraw sample
project - http://www.pdftron.com/pdfnet/samplecode.html#PDFDraw) to
generate an image from a PDF page. If you only need to render only the
area covered by an image, you could crop the page using image bounding
box before using PDFDraw.

Aaron_Gravesdale · January 6, 2011, 10:22pm

Q: Thank you for your response. I am trying to extract the images from
the PDF as separate images. Is there documentation on how to use the
API to extract the required information to recreate the soft channel
for the embedded image? Where can I find this?
---------------
A: You could use image.GetSoftMask() and image.GetMask() to extract
associated soft or image (monochrome) mask. For example:

Obj sobj = image.GetSoftMask();
if (sobj != null) { // Extract soft mask
   Image soft_mask = new Image(sobj);
   soft_mask.Export
   System.Drawing.Bitmap bmp = image.GetBitmap();
   ...
}

You can also have a soft mask in the graphics state (as a form
xobject). You can use PDFDraw to rasterize this form xobject to a
bitmap, but in this case you may as well rasterize the whole (or image
boox of) page. Please keep in mind that the image in PDF could also be
clipped with a vector path or text.