Accurate PDF to HTML / SVG Conversion

Aaron_Gravesdale · February 1, 2012, 8:53pm

Q: We have made significant steps in our efforts to implement a visually accurate

HTML/CSS3 conversion based on PDFNet SDK, but we still have issues especially with low level character information like char-spacing , stretched characters and char-encoding of symbols, ligatures etc…

Also are extra paramaters avaialble in PDF to SVG converter going to be available in pdftron.PDF.Convert.ToSvg() anytime soon?

A:

Extra paramaters in PDF to SVG Converter ‘pdftron.PDF.Convert.ToSvg()’ will be included as part of PDFNet v.5.8 which will be released in early Feb.

PDFTron is also about to release PDFNet WebViewer (on our website currently called SilverDox and limited to Silverlight). Currently it supports Silverlight viewing, with the new version HTML5 and Flash client will also be available. More information, please see:

http://www.pdftron.com/silverdox/index.html

For HTML5 viewer (desktop) preview, please see:

http://www.pdftron.com/JS/ReaderControl.html?d=http://www.pdftron.com/silverdox/samples/ClientBin/Declare.xod

http://www.pdftron.com/JS/ReaderControl.html?d=http://www.pdftron.com/silverdox/samples/ClientBin/PDF32000_2008.xod

The HTML WebViewer SDK will come with the API which is essentially identical to the current SilverDox API (http://www.pdftron.com/silverdox/documentation/Index.html), so you should be able to customize any aspect of the viewing experience (including development of custom controls).

I case you need to generate static HTML output, however some of our clients have extended Pdf2Html sample (http://www.pdftron.com/pdfnet/samplecode/Pdf2Html.cs; currently only available as a sample in C#, but the same API’s apply) . The only intent of this sample is to show how to use core PDFNet API to implement a very basic PDF to HTML converter. It was not designed to be bullet proof nor to be used in production. The main limitation is related to font substitution. In PDF fonts are typically embedded, which guarantees accurate text reproduction. In case of Pdf2Html sample text locations are correct, however in some cases (where font match is not found) substituted font has larger advance widths words can grow and start overlapping each other. You could verify this by adjusting the font size in the converter (e.g. scaling it down 30% or more). You could extract embedded fonts (pdftron.PDF.Font.GetGlyphPath) and normalize them to WOFF (a format compatible with most browsers) then use these ‘web fonts’ instead of default fonts.

Ivanho · October 19, 2012, 8:43pm

Just an update that the WebViewer (http://www.pdftron.com/pdfnet/webviewer) is already available for some time:

For a demo see: http://www.pdftron.com/pdfnet/webviewer/demo.html
For a quick test with your own files use http://s84786.gridserver.com/website/demo/bookstore/upload.php or http://www.pdftron.com/pdfnet/cloud/index.html