Improving speed of multithreaded PDF splitting and merging

Aaron_Gravesdale · October 23, 2009, 10:21pm

Q: I have a problem using pdftron component in multithreading. Using
10-15 concurrent thread to extract some pages from different pdf the
performance are worse than using one thread.

The source code used to retrieving pages is follow :

ArrayList arPa = new ArrayList();
PDFNet.Initialize(Settings.Default.TronLicKey);
PDFDoc newdoc_2 = new PDFDoc();
PDFDoc doc = new PDFDoc(sourcePdf);
doc.InitSecurityHandler();
Page pag = null;
for (int i = 0; i < countPage; i++)
{
pag = doc.GetPage(startPage + i);
arPa.Add(pag);
}

ArrayList arpagimp = newdoc_2.ImportPages(arPa);
for (int i = 0; i < arpagimp.Count; ++i)
{
newdoc_2.PagePushBack((Page)arpagimp[i]); // Order pages in reverse
order.
}
newdoc_2.Save(outPdf, SDFDoc.SaveOptions.e_remove_unused);
doc.Close();
newdoc_2.Close();

Have you any suggestion ?
-----------
A: It depends on what type of machine you are running your process. If
you are running on a single CPU machine, it would not be surprising
that 10-15 concurrent threads would run slower than a single thread
(i.e. processing files in sequence). Also please keep in mind that
during PDF split and merge operations the main speed bottleneck is
disk access. If all threads are accessing files on the same disk, the
bottleneck may be the disk access time.

Based on your description the problem it seems that the problem is not
directly related to PDFNet. You may need to change your system
configuration to include additional drives, or additional physical
servers (e.g. a PDF processing server farm).