How to split existing pages based on their content?

Aaron_Gravesdale · April 27, 2007, 7:59pm

Q:
Can you give me some starting pointers to perform the following:

The flow is as follows:

1.LOAD/OPEN PDF
2.GET PAGE COUNT
While count < = page count
  3.READ TOP REGION OF PAGE TO TEXT
     when I find the text that indicates new page
        4.SPLIT OUT NEW PDF (SOURCE, NEW, FROM PAGE, TO PAGE)
        5.CONVERT NEW.PDF TO NEW.TIFF
   6.MOVE TO NEXT PAGE
End While
---
A:

1.LOAD/OPEN PDF
2.GET PAGE COUNT

PDFNet.Initialize();
// See http://pdftron.com/net/faq.html#pdfnet_res
PDFNet.SetResourcesPath("c:/myapp/pdfnet.res");

PDFDoc doc = new PDFDoc(input_path + "newsletter.pdf");
doc.InitSecurityHandler();
int pgnum = doc.GetPagesCount();

3.READ TOP REGION OF PAGE TO TEXT
when I find the text that indicates new page

As a starting point, you may want to take a look at TextExtract
(http://pdftron.com/net/samplecode.html#TextExtract) sample project.

So you could determine the splitting point using ElementReader and
Element interface.

To split the content on an existing page into multiple pages you can
use the same approach illustrated in ElementEdit sample:
http://pdftron.com/net/samplecode.html#ElementEdit.

ElementEdit sample modifies content of existing page, but you can
simply copy page element on several output pages without any changes
to their graphics state.

In case you don't need to split the content in the page, but you
simply want to split pages on the document level you can follow the
approach used in PDFPage sample project (http://pdftron.com/net/
samplecode.html#PDFPage and http://pdftron.com/net/usermanual.html#page_manip).

5.CONVERT NEW.PDF TO NEW.TIFF

You can convert newly created pages to TIFF and other image formats
using PDFDraw class (please see http://pdftron.com/net/usermanual.html#PDFDraw
sample project). All of this can be done without creating any
intermediate or temporary files.