Q: We have made significant steps in our efforts to implement a visually accurate
HTML/CSS3 conversion based on PDFNet SDK, but we still have issues especially with low level character information like char-spacing , stretched characters and char-encoding of symbols, ligatures etc…
Also are extra paramaters avaialble in PDF to SVG converter going to be available in pdftron.PDF.Convert.ToSvg() anytime soon?
A:
Extra paramaters in PDF to SVG Converter ‘pdftron.PDF.Convert.ToSvg()’ will be included as part of PDFNet v.5.8 which will be released in early Feb.
PDFTron is also about to release PDFNet WebViewer (on our website currently called SilverDox and limited to Silverlight). Currently it supports Silverlight viewing, with the new version HTML5 and Flash client will also be available. More information, please see:
http://www.pdftron.com/silverdox/index.html
For HTML5 viewer (desktop) preview, please see:
http://www.pdftron.com/JS/ReaderControl.html?d=http://www.pdftron.com/silverdox/samples/ClientBin/Declare.xod
http://www.pdftron.com/JS/ReaderControl.html?d=http://www.pdftron.com/silverdox/samples/ClientBin/PDF32000_2008.xod
The HTML WebViewer SDK will come with the API which is essentially identical to the current SilverDox API (http://www.pdftron.com/silverdox/documentation/Index.html), so you should be able to customize any aspect of the viewing experience (including development of custom controls).
I case you need to generate static HTML output, however some of our clients have extended Pdf2Html sample (http://www.pdftron.com/pdfnet/samplecode/Pdf2Html.cs; currently only available as a sample in C#, but the same API’s apply) . The only intent of this sample is to show how to use core PDFNet API to implement a very basic PDF to HTML converter. It was not designed to be bullet proof nor to be used in production. The main limitation is related to font substitution. In PDF fonts are typically embedded, which guarantees accurate text reproduction. In case of Pdf2Html sample text locations are correct, however in some cases (where font match is not found) substituted font has larger advance widths words can grow and start overlapping each other. You could verify this by adjusting the font size in the converter (e.g. scaling it down 30% or more). You could extract embedded fonts (pdftron.PDF.Font.GetGlyphPath) and normalize them to WOFF (a format compatible with most browsers) then use these ‘web fonts’ instead of default fonts.