How do I prevent out-of memory condition in JAVA/.NET when using PDFDraw?

Q: I am using the following JAVA snippet to convert PDF pages to
raster images. PDFNet was fantastic so far, but recently I run into a
memory conditions (bad allocation or similar).

It is hard to know what exactly happened from the exception - can you
tell me what went wrong?

import java.awt.Dimension;
import java.awt.Graphics;
import java.awt.Graphics2D;
import java.awt.Rectangle;
import java.awt.geom.AffineTransform;
import java.awt.image.BufferedImage;
import java.awt.print.PageFormat;
import java.awt.print.Paper;
import java.awt.print.Printable;
import java.awt.print.PrinterException;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;

import javax.imageio.ImageIO;

import pdftron.Common.PDFNetException;
import pdftron.PDF.PDFDoc;
import pdftron.PDF.PDFDraw;
import pdftron.PDF.PDFNet;
import pdftron.PDF.Page;
import pdftron.PDF.PageIterator;
import pdftron.SDF.Obj;
import pdftron.SDF.ObjSet;

public class AssemblePdfFromImages implements Printable {
  static private String workDir = "temp";
  static private String pageFileName = "page";
  static private String workPdf = "pdf.pdf";
  private File outputPathFile;
  ArrayList<String> pagesImages = null;

  public boolean flattenPDF(File folder, Job job, String
fileNameToFlatten) {
    boolean allDone = false;

    // create JPG images from the PDF file pages
    String input_path = folder.getAbsolutePath();
    String output_path = folder.getAbsolutePath() + "\\" + workDir + "\
\";

    try
    {
      outputPathFile = new File(output_path); // need to create a folder
      if (outputPathFile.exists()) {
        if (!outputPathFile.isFile()) {
          File files[] = outputPathFile.listFiles();
          for (int i = 0; i < files.length; i++)
            files[i].delete();
        }
        outputPathFile.delete();
      }
      outputPathFile.mkdir();

      // The first step in every application using PDFNet is to
initialize the
      // library and set the path to common PDF resources. The library is
usually
      // initialized only once, but calling Initialize() multiple times
is also fine.
      PDFNet.initialize();
      PDFNet.setResourcesPath("resources");

      PDFDraw draw=new PDFDraw(); // PDFDraw class is used to rasterize
PDF pages.
      ObjSet hint_set=new ObjSet();

      //--------------------------------------------------------------------------------
      PDFDoc doc = new PDFDoc((folder.getAbsolutePath() + "\\" +
fileNameToFlatten));
      // Initialize the security handler, in case the PDF is encrypted.
      doc.initSecurityHandler();

      draw.setDPI(400); // Set the output resolution is to 300 DPI.

      // Use optional encoder parameter to specify JPEG quality.
      Obj encoder_param=hint_set.createDict();
      encoder_param.putNumber("Quality", 100);

      pagesImages = new ArrayList<String>();

      // Traverse all pages in the document.
      Debug.debug("Converting pages to jpg-s:");
      for (PageIterator itr=doc.getPageIterator(); itr.hasNext():wink: {
        Page current=(Page)(itr.next());
        String pageName=output_path + pageFileName + current.getIndex() +
".jpg";
        Debug.debug(pageName);
        draw.export(current, pageName, "JPEG", encoder_param);
        pagesImages.add(pageName);
      }

      Debug.debug("Done.");
      draw.destroy(); // < Added
      // hint_set.destroy(); // < Added
      doc.close();

      //// more processing

    } catch(Exception e) {
      Debug.debug("failed to decode/assemble PDF for flattening!", e);
    }

    return allDone;
  }
------------
A: Because you are repeatedly allocating a new PDFDraw object, you
would need to call pdfdraw.destroy() in order to quickly release the
memory (Java garbage collector may be sluggish to release the memory
in time).

The same applies to .NET programming (PDFDray implements IDispose
interface). You can use 'using' keyword or explicitly call
pdfdraw.Dispose() to immediately release the memory.

Another factor influencing the memory usage is the resolution/DPI at
which you are rasterizing the image. If the DPI is very large you
could potentially run out of memory. In this case you may need to
rasterize the PDF page in tiles, or increase the available memory.

Q: I have been merging a large number of PDF's.
I have been using code along the lines of the snippet from the answer
here:
http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/ae67034c64dad7f8

I have also been writing some text to the page whilst processing each
page.

I've been profiling the memory and CPU usage and can see a large
amount of memory being used up to when the file gets written to disk
and after it's written a large amount still allocated.

I suspect there should be another way of doing what I'm doing in order
to make more use of the disk and less use of the memory which is more
suitable in situations where a large numbers of files being merged?

Please find below my code.

DateTime start = DateTime.Now;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();

PDFNet.Initialize();

string[] listOfFiles = Directory.GetFiles(@"TestData\Lots");

using (PDFDoc doc = new PDFDoc())
{
    doc.InitSecurityHandler();

    Font font = Font.CreateTrueTypeFont(doc, @"C:\WINDOWS\Fonts
\ARIAL.TTF");

    foreach (string inputFile in listOfFiles)
    {
        using (PDFDoc in_doc = new PDFDoc(inputFile))
        {
            in_doc.InitSecurityHandler();

            ArrayList copy_pages = new ArrayList();
            for (PageIterator itr = in_doc.GetPageIterator();
itr.HasNext(); itr.Next())
            {
                copy_pages.Add(itr.Current());
            }

            ArrayList imported_pages = doc.ImportPages(copy_pages);
            foreach (Page importedPage in imported_pages)
            {
                doc.PagePushBack(importedPage);

                AddTextPage(importedPage, 100, 100, font, "Hello
World");
            }

            in_doc.Close();
        }
    }

    doc.Save(DateTime.Now.ToString("HHmmss") + "-merged.pdf",
pdftron.SDF.SDFDoc.SaveOptions.e_linearized);
    doc.Close();
}

stopwatch.Stop();
DateTime end = DateTime.Now;
Console.WriteLine("TestMergeAndWriteDocument Start: " + start);
Console.WriteLine("End: " + end);
Console.WriteLine((end - start).ToString());
Console.WriteLine("Time to run Merge: " +
stopwatch.ElapsedMilliseconds);

public void AddTextPage(Page page, double xPos, double yPos, Font
font, string text)
{
    ElementBuilder inkwell = new ElementBuilder();
    ElementWriter quill = new ElementWriter();

    // Select page
    quill.Begin(page);

    // set font
    quill.WriteElement(inkwell.CreateTextBegin());

    // set text
    Element textBlock = inkwell.CreateTextRun(text, font, 10);

    // correct position for shift
    // xPos -= XShift;
    // yPos -= YShift;
    // position text
    textBlock.SetTextMatrix(1, 0, 0, 1, xPos, yPos);

    // write text
    quill.WriteElement(textBlock);
    quill.WriteElement(inkwell.CreateTextEnd());
    quill.End();

quill.Dispose() < Added
inkwell.Dispose() < Added
    }
-----------
A: If you would like to keep memory consumption under control in
managed languages (C#,JAVA,VB,etc) you need to call Dispose() [.NET]
or destroy() [JAVA] on ElementBuilder, ElementWriter, PDFDraw,
ElementReader, ... when they are no longer in use. Leaving memory
management to garbage collector for these objects can lead to sub-
optimal performance.

In .NET you can also use IDispose pattern (i.e. 'using' keyword) to
automatically clanup resources when they are no longer in use.

Q: Many thanks, that has certainly made the memory consumption look a
lot more healthy.

I have a couple of follow up questions:

* What is the recommended life cycle for ElementBuilder,
ElementWriter, I am currently batch processing, and have constructed
one of each at the beginning of the batch and Disposed them at the
end. Is there any reason that instead I should be, for example,
creating and disposing ElementBuilder and ElementWriter on every page?

* I have profiled the memory usage (having fixed the memory leak) and
am seeing a roughly linear correlation while merging the PDF's. Is
there any way, having processed a number of the PDF's to save them out
to file and then continue to append to that file, restricting the
memory usage?

Incidentally I'm using C#.
----------
A:

I have profiled the memory usage (having fixed the memory
leak) and am seeing a roughly linear correlation while merging the
PDF's. Is there any way, having processed a number of the PDF's to
save them out to file and then continue to append to that file,
restricting the memory usage?

Yes, you could periodically save the file and then continue to add
pages from other PDF documents. Usually this is not required since the
memory usage is usually not a problem during PDF merging.

Is there any reason that instead I should be, for example, creating
and disposing ElementBuilder and ElementWriter on every page?

Not really. You can reuse the same ElementBuilder/ElementWriter for
the creation of an entire document. This will be more efficient.

In fact you could create ElementBuilder/ElementWriter only once (e.g.
as a global variable) and use the them for creation of all PDF
documents. Of course, this probably wouldn't work well for multi-
threaded applications, but may be useful in some other use-case
scenarios.