Extracting inline images from PDF

Aaron_Gravesdale · October 15, 2009, 5:31pm

Q: I used Image2RGB to extract an inline image from a PDF (along the
lines of your ElementReaderAdv sample -http://www.pdftron.com/pdfnet/
samplecode.html#ElementReaderAdv). The resulting
“System.Drawing.Bitmap” is skewed 45deg to the right. Do I need to
somehow include the image.GetCTM() matrix to get the correct image?

Attached is a zip file of a simple .net project that extracts the
inline image and draws the image on the form. The pdf named
Q7_Page3.pdf is also included in the zip file and only contains one
page with the inline image as the only element. The key part of the
code is ProcessInlineImage():

private void Main()
{
PDFNet.Initialize();
try
{
PDFDoc doc = new PDFDoc(“Q7_Page3.pdf”);
doc.InitSecurityHandler();

int pgnum = doc.GetPageCount();
PageIterator itr;

ElementReader page_reader = new ElementReader();
for (itr = doc.GetPageIterator(); itr.HasNext(); itr.Next()) //
Read every page
{

page_reader.Begin(itr.Current());
ProcessElements(page_reader);
page_reader.End();
}

// Calling Dispose() on ElementReader/Writer/Builder can result in
increased performance and lower memory consumption.
page_reader.Dispose();

doc.Close();
}
catch (PDFNetException ex)
{
MessageBox.Show(ex.Message);
}
}

private void ProcessElements(ElementReader reader)
{
Element element;
while ((element = reader.Next()) != null) // Read page contents
{
if (element.GetType() != Element.Type.e_inline_image)
continue;

ProcessInlineImage(element);
}
}

private void ProcessInlineImage(Element image)
{
int width = image.GetImageWidth();
int height = image.GetImageHeight();
int out_data_sz = width * height * 3;

Image2RGB img_conv = new Image2RGB(image); // Extract and convert
image to RGB 8-bpc format
FilterReader reader = new FilterReader(img_conv); //
byte[] image_data_out = new byte[out_data_sz]; // A buffer used
to keep image data.
reader.Read(image_data_out); // image_data_out contains RGB image
data.

bmp = BytesToBmp(image_data_out, width, height);

}

private unsafe System.Drawing.Bitmap BytesToBmp(byte[] bmpBytes, int
width, int height)
{
System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(width,
height, System.Drawing.Imaging.PixelFormat.Format24bppRgb);

System.Drawing.Imaging.BitmapData bData = bmp.LockBits(new
System.Drawing.Rectangle(new System.Drawing.Point(), bmp.Size),
System.Drawing.Imaging.ImageLockMode.WriteOnly,
System.Drawing.Imaging.PixelFormat.Format24bppRgb);

// Copy the bytes to the bitmap object

System.Runtime.InteropServices.Marshal.Copy(bmpBytes, 0,
bData.Scan0, bmpBytes.Length);

bmp.UnlockBits(bData);
return bmp;

}

private void Form1_Paint(object sender, PaintEventArgs e)
{
e.Graphics.DrawImage(bmp, new System.Drawing.Point(100, 100));
}

A: The most likely reason why the image is skewed is because .NET
Bitmap is using some ‘padding’ bytes on each line, whereas Image2RGB
returns non-padded image data. To fix this you can copy the image data
line-by-line (e.g. in a for loop) and advancing the copy index using
bData.Stride. So instead of copying the whole memory in one shot you
need to copy line by line making sure to start the new line on 4 byte
boundary. Something like this:

Bitmap bmp = new Bitmap(width, height);
Rectangle rect = new Rectangle(0, 0, width, height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadWrite,
PixelFormat.Format24bppRgb);

IntPtr ptr = bmpData.Scan0;
int bytes_per_row = width*3;
For (int i=0; i<height; ++i) {
… read bytes_per_row in image_data_row using FilterReader
System.Runtime.InteropServices.Marshal.Copy(image_data_row, 0, ptr,
bytes_per_row);
ptr = ptr + bmpData.Stride;
}
bmp.UnlockBits(bmpData);

You can probably find more information on this topic in GDI+ developer
forums.