Extract the largest image from a PDF page


We are trying to identify the main image on a pdf an extract it. Main image is defined as the image with the largest displayed dimension and more than a minimum number of colours. Is there any way for use to detect this?

We are able to find the image with the largest dimension but this often includes large images that are scaled down, and may in fact display as a small image.


As a starting point you may want to take a look at ImageExtract sample project:

Now if you mainly care about physical dimensions then you can use ‘element.GetBBox()’ along the lines of ElementReaderAdv sample:

You can also obtain a specific matrix that positions and transforms (e.g. with rotation, scale, skew) the image with image.GetCTM();