Why are some characters produced by iText not rendered/converted correctly?

Ivanho · March 15, 2013, 6:39pm

Q:

We encountered some PDFs generated with iText PDF generator that are using possibly locale-specific characters (for example Romanian diacriticals, Polish characters - like Ę etc) that are missing them when PDF is rendred or converted to another format (such as XPS or SVG). Can you shed some light on this?

A:

The problem is due to a bug in iText library that generated your PDFs.

Specifically, your files contain references to base 14 fonts (Helvetica, Times-Roman) that are not embedded in PDF and are also using a ‘custom encoding’ that reference characters such as ‘eogonek’, ‘tcommaaccent’ … that are also not present in any standard PDF encoding.

These files displays ok in Acrobat, but you have no guarantee that it will work anywhere else (i.e. accurate PDF reproduction is not guaranteed).

To guard against such surprises, relaible PDF producers should always embed all fonts. This also includes ‘Base 14 fonts’; different PDF consumers use different fonts to represent base 14 fonts so results may vary between apps/systems. For this reason PDF/A requires that all fonts must be embedded. If the file size is concern, fonts can be subsetted.

The following are potential solutions:

Make sure that all fonts are embedded.
Override PDFNet built-in fonts with another font using PDFNet.AddFontSubst(), as shown in PDFDraw sample project http://www.pdftron.com/pdfnet/samplecode.html#PDFDraw). For example:

PDFNet::Initialze(…);
…

PDFNet::AddFontSubst(“Times-Roman”, “c:/windows/fonts/times.ttf”);
PDFNet::AddFontSubst(“Times-Italic”, “c:/windows/fonts/timesi.ttf”);
PDFNet::AddFontSubst(“Helvetica”, “c:/windows/fonts/arial.ttf”);
…

PDFDraw.GetBitmap()
Pdftron.PDF.Convert.FromXps(…)

…will produce the correct output.