Efficient merging of large PDF documents.

Q: I need to verify that large PDFs can be created without choking the
system with CPU/RAM consumption. This test reads pages from existing
PDFs and adds them to a final PDF. With the straight forward code, the
document is kept in memory, during creation/update, and saved at the
end with the "Save" function.

How can I create large PDF files without consuming lots of memory? Is
there a way to connect a document to a file stream right at the
creation stage so that all writes to it go straight to disk? Any other
suggestions?
-------------------------

A: By default, PDFNet is cashing most of the written document to disk
(unless you explicitly disable disk caching using
PDFNet.SetDsikCaching(false)). As a result PDFNet can be used to
efficiently create very large documents without much memory usage.

How are you creating a document? A code snippet would help us to
better understand the problem. Are you explicitly disposing large
objects such as PDFDoc, ElementBuilder/ElementWriter etc?

Q: I am attaching the code snippet for your review. I noticed a
dramatic improvement in memory usage by using the "ImportPages"
function to get all pages at once, rather than reading one page at a
time and inserting to the new document. The one thing I am still
confused over is why we still have to insert pages into the new
document, once "ImportPages" is already called from the new document.
Your code samples are doing the same thing as well.

I tried to disable disk caching, and saw that memory usage increased.
This is quite obvious though. Good thing that caching is enabled by
default.
---------------

A: When page(s) are imported to the target document using
"ImportPages" they are not automatically inserted in the page document
sequence, since a user may have other plans (e.g. to place imported
pages on a target layout - as shown in ImpositionTest etc).

In case you simply want to merge/split PDF documents you could use
doc.InsertPages()/MovePages() methods. For example:

// Sample 1 - Split a PDF document into multiple pages
using (PDFDoc in_doc = new PDFDoc("newsletter.pdf")) {
  in_doc.InitSecurityHandler();
  int page_num = in_doc.GetPageCount();
  for (int i = 1; i <= page_num; ++i) {
    using (PDFDoc new_doc = new PDFDoc()) {
      new_doc.InsertPages(0, in_doc, i, i, PDFDoc.InsertFlag.e_none);
      new_doc.Save("newsletter_split_page_" + i + ".pdf",
SDFDoc.SaveOptions.e_remove_unused);
    }
  }
}

// Sample 2 - Merge several PDF documents into one
using (PDFDoc new_doc = new PDFDoc()) {
  new_doc.InitSecurityHandler();
  int page_num = 15;
  for (int i = 1; i <= page_num; ++i) {
    using (PDFDoc in_doc = new PDFDoc(output_path +
"newsletter_split_page_" + i + ".pdf")) {
      new_doc.InsertPages(i, in_doc, 1, in_doc.GetPageCount(),
PDFDoc.InsertFlag.e_none);
    }
  }
  new_doc.Save("newsletter_merge_pages.pdf",
SDFDoc.SaveOptions.e_remove_unused);
}

I noticed that in your code you are using ElementBuilder/ElementWriter
to stamp PDF pages. You can alternatively use pdftron.PDF.Stamper (as
shown in Stamper sample). In either case you should call Dispose() [or
use C# 'using keyword'] to release memory as soon as possible:

var writer = new ElementWriter();
var builder = new ElementBuilder();
...
writer.Dispose();
builder.Dispose();

or

using (var writer = new ElementWriter()) {
  ...
}

There are some other tricks that can be used to keep the memory
requirements even lower (e.g. closing and reopening the target
document in between merge operations), however I am not sure if it is
worth the trouble.

On May 25, 11:10 am, Support <supp...@pdftron.com> wrote:

Q: I need to verify that large PDFs can be created without choking the
system with CPU/RAM consumption. This test reads pages from existing
PDFs and adds them to a final PDF. With the straight forward code, the
document is kept in memory, during creation/update, and saved at the
end with the "Save" function.

How can I create large PDF files without consuming lots of memory? Is
there a way to connect a document to a file stream right at the
creation stage so that all writes to it go straight to disk? Any other
suggestions?

-------------------------

A: By default, PDFNet is cashing most of the written document to disk
(unless you explicitly disable disk caching using
PDFNet.SetDsikCaching(false)). As a result PDFNet can be used to
efficiently create very large documents without much memory usage.

How are you creating a document? A code snippet would help us to
better understand the problem. Are you explicitly disposing large
objects such as PDFDoc, ElementBuilder/ElementWriter etc?