Adding font data to PDF documents with missing fonts

Q:

We want to use PDFNet to embed fonts in PDF documents to allow us to
control exactly which fonts get embedded.

Our PDF creation software only embeds all fonts or no fonts and we
need better control over which fonts are embedded.

I am finding that when I embed fonts into the document, some
characters no longer display correctly. Specifically we have noticed
this on the special quotation characters MS Word uses.

Can you please let me know whether this is a PDFNet issue, or whether
we are missing something on our end?

// The test code snippet looks as follows:

PDFNet.Initialize();

// Create input and output docs
PDFDoc doc = new PDFDoc(“my.pdf”);

// Create embedded font resource
pdftron.PDF.Font embeddedFont =
pdftron.PDF.Font.CreateTrueTypeFont(doc, @“c:\Windows\Fonts
\Arial.ttf”, true, false);


// Begin iterating the pages
elementReader.Begin(originalPage);

// Iterate the elements
Element element;
while ((element = elementReader.Next()) != null) {
// If it’s text…
if ((element.GetType() == Element.Type.e_text)) {
GState gState = element.GetGState();
// …and it’s Arial…
if (gState.GetFont().GetName().ToLowerInvariant() ==
“arialmt”) {
// …switch to using the embeded font
gState.SetFont(embeddedFont, gState.GetFontSize());
}
}
}
elementReader.End();

doc.Save(“my2.pdf”, pdftron.SDF.Doc.SaveOptions.e_remove_unused);


A:

PDF printer drivers (such as Adobe Distiller) usually have an option
to customize a list of fonts that must be embedded. The best place to
control font embedding is as part of PDF creation process.

In your sample code all page content is re-written and references to
the old font are replaced with references to the new font. There are
several difficulties with this approach. Since you are replacing the
entire font, font widths and other font descriptors are different and
as a result the text might shift or render inaccurately. Even though
both the PDF producer and PDFNet SDK may refer to the same input font,
the resulting PDF fonts may differ. For example, a PDF producer may
use custom encoding to represent text, but PDFNet may create a font
with a standard encoding. Now, if you replace the font associated with
the custom encoding, the text will not render properly.

A better approach would be to add the font stream to the existing PDF
font. This would preserve font encoding, widths, and descriptor flags
required for proper text rendering. For example, the following code
snippet enumerates all fonts on the given page and embeds font data to
any PDF font that is not embedded:

// C# pseudocode
pdftron.SDF.Obj res = page.GetResources();
if (res != null) {
SDF.Obj fonts = res.FindObj(“Font”)
if (fonts != null) {
SDF.DictInterator itr = fonts.DictBegin();
SDF.DictInterator end = fonts.DictEnd();
for (; itr!=end; itr.Next()) {
Obj fnt_dict = itr.Value() // (in C++ use itr->second)
pdftron.PDF.Font font = new pdftron.PDF.Font(fnt_dict);
if (font.IsEmbedded()) continue;

// Embed the font data and associate it with the existing PDF
font.
… make sure that font names match…
… alternatively you can also create a new font using
Font.Create??() methods
… and then copy the file stream (i.e.
font.GetDescriptor().Put(“FontFile2”, tmpfont.GetEmbeddedFont())
Filters.StdFile embed_file = new StdFile(“myfont.ttf”,
StdFile.OpenMode.e_read_mode);
Filters.FilterReader mystm = new FilterReader(embed_file);
pdftron.SDF.Obj fnt_stm = pdfdoc.CreateIndirectStream(mystm);
fnt_stm.Put(“Length1”, Obj.CreateNumber(embed_file.FileSize()))
font.GetDescriptor().Put(“FontFile2”, fnt_stm);
}
}

Note: the “FontFile2” string in the above code should be “FontFile2” regardless of the particular TrueType font being embedded. (Specifically, it should not be be name of the font file, as one might assume.) The string identifies the stream as an embedded TrueType font. For more information, see the PDF Reference, Section 5.7 “Font Descriptors”.