What are developing import module for PDF and would like to extract
all PDF data, x/y locations etc. I have had some success with getting
the data out for text and images but a bit of a problem when it comes
to strokes and rectangles.
I also want to be able to convert anything we do not support into an
image / raster format. So lets say the element is a chart or some
object we do not support we do not want to throw away this element we
want to be able to import the element in but as an image instead. So
we would need the original x,y location but also be able to get it as
an image. Is there a way to do this using PDFNet SDK?
A good starting point for your project would the following samples:
(and specifically ProcessPath function that shows how to extract
(you may also want to get the latest PDFNet Preview -
www.pdftron.com/downloads/PDFNetPreviewDemo.zip that includes a new
utility class (called TextExtractor) that can be used to reconstruct
words and logical structure).
Regarding conversion of individual graphical elements (e.g. paths) to
raster images there are couple of options:
- You could draw graphical objects on a GDI+ surface (e.g. a bitmap)
- You could copy the element on a new page (similar to ElementEdit
sample project: www.pdftron.com/net/samplecode.html#ElementEdit) and
then pass the temporary page to PDFNet rasterizer which will rasterize
the element for you. You don't need to add the page to document's page
sequence. To get the initial page boundary/media box you can use