Batch converting HTML to PDF from C# or JAVA on the server side

Q: What we need to do is convert many HTML files to a PDF. We need to be able to control page numbering for each HTML file converted. Example, convert one HTML file that has 2 pages, numbered 1 and 2, then convert another HTML file that has three pages into the same PDF, pages numbered 1, 2 and 3. We would end up with one PDF document with pages 1, 2, 1, 2, 3. We also need to provide the users with the ability to view and print the PDF. Also need to be able to batch print PDF’s in the background, no user intervention.

A: This all easy to do with PDFNet.

  1. To start download whichever PDFNet library works best for you. http://www.pdftron.com/pdfnet/downloads.html

  2. Download the “HTML to PDF Conversion Module” which is located further down the page above.

  3. Copy the html2pdf.dll, from the conversion module, into the Lib folder that came with your PDFNet download. There are more detailed instructions included with the HTML to PDF module.

  4. Included with PDFNet are many samples, including a HTML2PDF sample. Run this to see the how it conversion works.

  5. Modify the HTML2PDF sample to do what you want it to do.

Regarding page numbering, if you mean simply controlling the order of pages merged from different PDF sources, please see the following sample on how to do that (this sample, and all other samples, are included in the PDFNet download).

http://www.pdftron.com/pdfnet/samplecode/PDFPageTest.cs

If instead you mean the page numbering as it appears on the actual page, this is more involved. Our new ContentReplacer class is the best place to start. See the ContentReplacer sample included with PDFNet.

As for viewing and printing this is all easy to do, as PDFNet includes a class called PDFViewCtrl to provide the full viewing experience. Check out the PDFViewSimple sample.

Finally, for batch printing, see PDFPrint and ConvertSamples: http://www.pdftron.com/pdfnet/samplecode/ConvertTest.cs