Memory Leak/Issue Debugging & JVM Tuning with Java SDK

Hello,

I am doing some evaluation work and have been developing a test app
using the java sdk. Environment is the following:

-OS is Ubuntu 10.0.4
-Machine has 4GB memory
-Java is Sun 1.6.0.22
-Application is PHP talking via phpjavabridge to a servlet I wrote
that wraps the java sdk

In this first test I am processing documents that range from
30MB-155MB in size. I am opening a doc, then going through each page
and generating a full size JPEG image, based on the size returned from
Page.getUserUnitSize.

I am running into three issues currently with this setup. One is that
there is a significant memory leak, the second is severe CPU issues
with it spiking to 100% during the duration fo the processing, the
third is I am receiving random OutOfMemory errors when processing
certain pages (which I assume are either too large in content?, its
not a size issue as it will process the first x pages and always die
on page Y)

I did initial development on my local windows machine, appeared to
solved the memory leak issue with pointers from this post:

http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/8a8c2e6182a5752a?pli=1

Main pts being to call pdfdraw.destroy() with repeated page drawing
and making sure to give a .close call to the doc.

However when I run the same code on the server the leak persists.
Another interesting difference is that on my local machine when the
leak was present all memory would be freed up when I killed tomcat.
On the server the memory persists even after killing tomcat.

Any pointers on debugging what could be going on?

Second issue is the CPU spiking to 100% during the whole document
processing. I am not a java guy, in doing a bunch of reading to
solving an OutOfMemory issue which randomly pops up and kills the
server/process, I came across a lot of content about JVM tuning for
the HeapSize as well as params around garbage collection. Since the
post included above mentions the need to call .destroy() b/c of GC
issues, I figured maybe there is some tuning that can be done to
alleviate all these leak/performance issues I have been dealing with.
I tried the following for JAVA_OPTS:

JAVA_OPTS="-Xms1G -Xmx2G -XX:PermSize=128m -XX:MaxPermSize=128m -
Xshare:off -XX:NewSize=512m -Xss1024K"

With giving the JVM a minimum of 1G, Max of 2G memory, and the NewSize
value was touted to solve CPU spiking in java programs where GC is
happening too often and takes more CPU than the application processing
itself. The other params I dont fully understand but were included
with the other setting suggestions in multiple locations.

Any recomdations for JVM settings related to using the java sdk

The third OutOfMemory process killing issue I assume is b/c the
content for particular pages has too much going on or something and is
sapping the memory, maybe the tuning will help solve this issue as
well. I know the above post mentioned sometimes tiling might need to
be done for certain page/image processing. If that is the case how
can I programmatically determine that w/o relying on my process dying
and manually flagging the doc as I am doing now

I am going to embark on a next round of testing this weekend to see if
I could get to the bottom of things, but after several weeks of
dealing with these issues I figured I would try here to see if anyone
had any suggestions that could help reduce these performance pains.

Thanks in advance

Since your application is PHP is there a reasons why you are not using
the PDFNet PHP binding? It should be easier than going through Java
bridge etc...

Btw we have identified and resolved a memory leak in Java PDFDoc
constructor wrapper (when PDFDoc is opened from a Stream, but not from
a buffer or from a file) and the patch will be part of the next
update. As a temporary workaround, you could open a file from a memory
buffer or from a temp file. Please let me know if this helps.

Regarding the CPU spiking, the issue is that PDF rendering is requires
heavy computation and there is no simple way around it. You could
lower the resolution at which the image is rasterized or use a
server(s) with more CPU power (e.g. add multiple cores).

I actually just saw the latest release news from a few weeks back over
the weekend and started looking into the PHP binding. I will do some
prototyping with it since it would be nice to bypass the php to java
bridge.

It says its available for both windows and Linux but in the windows
download it is not present, is windows support coming?

On the memory side, is there any way to catch an exception from pdfnet
if the page you are attempting to render is too large and the library
will crap out? In trying to build an automated system to render
images it seems like a big issue to just have to try it out and see if
it works or fails vs being able to tell its too large for single pass
processing.

At first glance it appeared it was the size of the document that was
dictating/causing the memory failing issue, then I hit a handfull of
failures with docs that were just 30-40 MB in size