Solving text conversion issues (when converting from PDF to HTML/XOD/Image/SVG/EPUB...)

Ivanho · August 6, 2013, 9:03pm

Q:

I am using PDFNet to render images as well as for various conversions (from PDF to XOD, HTML, TXT, EPUB, etc).

On some files text does not seem to come out correctly. The weird thing is that on one server I get correct output but on another I am missing text (usually Japanese or Chinese).

A:

The problem is most likely due to missing fonts. Some PDF do not have all fonts embedded and you need a matching system font in order to accurately reproduce the document. PDFNet will attempt to use a closest matching system font. The more fonts you have available the better (e.g. you could install ‘font folio’ or similar). Also it is a good idea to install a system font with a broad Unicode coverage (such as Arial Unicode MS, ~23 MB – this fonts comes as part of MS Office). For improved compatibility with Acrobat you may also want to install Adobe font pack (http://www.adobe.com/support/downloads/detail.jsp?ftpID=5508). PDFNet will also allow you to override and customize fonts substitution (via PDFNet.AddFontSubst()), however in your case installing extra fonts is the way to go.