Convert PDF to HTML

Q:

I would like to know if PDFTron provides an SDK that converts a PDF ( containing text, image, tablets etc ) to HTML without any distortions.

Requirement: Document management System that needs to convert PDF documents into HTML via an existing web application.


A:

There are couple of ways to convert PDF to HTML via PDFNet SDK.

The highest quality output is produced using ‘pdftron.PDF.Convert.ToXod()’ and PDFNet HTML5 WebViewer. The WebViewer is using HTML5 to render PDF with a high degree of accuracy.

You can see some online samples here: http://www.pdftron.com/pdfnet/webviewer/demo.html

And you can convert your own files online using the Cloud API: http://www.pdftron.com/pdfnet/cloud/samples.html

The alternative is to convert PDF to HTML without use of canvas as shown in PDF to HTML sample (http://www.pdftron.com/pdfnet/samplecode/Pdf2Html.cs; the sample is part of PDFNet SDK for .NET - the same approach would work with JAVA and any other supported language). The main disadvantage of this approach is that plain HTML DOM does not support vector graphics, rotated images/text etc. To find more about other benefits of WebViewer compared to HTML output, please see http://www.pdftron.com/pdfnet/webviewer