How do I embed a font in a PDF with missing fonts?

Q: We produce a series of PDF documents which contain some machine
readable text on them. Neither of these fonts are available on the
workstations of users to whom these PDFs are distributed. The tool
that we’re using to actually generate the PDFs in the first place does
not support embedding fonts. So, my desire is to simply use the
pdftron sdk (http://www.pdftron.com/pdfnet/) to open an existing PDF,
and then embed the fonts. Ideally, we would introspect the file to see
what fonts are in use, and then embed a subset of the fonts on that
list. (Some of the fonts we know will be available on all of the
workstations, and we wouldn’t want to embed those for file size
reasons).


A: You could use PDFNet SDK to embed fonts. As a starting point you
would need to take a look at the type of font you are dealing with as
well as the PDF file generated using a third party tool. There are
several ways this functionality can be implemented using PDFNet. For
example, assuming that the tool generates text that reference a simple
true type font with WinAnsiiEncoding you could embed the font along
the following lines:

// C# Pseudocode
using pdftron;
using pdftron.Common;
using pdftron.Filters;
using pdftron.SDF;
using pdftron.PDF;

for (PageIterator itr = doc.GetPageIterator(); itr.HasNext(); itr.Next
())
{
Page pg = itr.Current();
Obj res = pg.GetResources();
if (res == null) continue;
Obj fonts = res.FindObj(“Font”);
if (fonts == null) continue;
for (DictIterator i = fonts.GetDictIterator(); i.HasNext(); i.Next
()) {
Font font = new Font(itr.Value());
if (font.GetType() != pdftron.PDF.Font.Type.e_TrueType) continue;
string fname = font.GetName();
if (fname != “MyFont”) continue;
// … we encountered ‘MyFont’, so embed the font…

 // 1) Obtain or create a font descriptor
 Obj fd = font.GetDescriptor();
 if (fd == null) fd = font.GetSDFObj().PutDict("FontDescriptor");
 if (fd.FindObj("FontFile2") != null) continue;

 // 2) Embed the font.
 MappedFile file = new MappedFile("my.ttf");
 FilterReader reader = new FilterReader(file);
 Obj stm = doc.CreateIndirectStream(reader);
 // To embed Flate compressed stream use:
 //  Obj stm = doc.CreateIndirectStream(reader, new FlateEncode

(null));
stm.PutNumber(“Length1”, file.FileSize());
fd.Put(“FontFile2”, stm);
file.Close();
}
}

There are many other possible variants and optimizations to the above
technique (e.g. in case there are many font instances which are not
sharing the same font descriptor you can make sure that the font is
embedded only once for the entire document etc). Another approach it
to change font property in the graphics state while enumerating PDF
content (similar to ElementEdit sample - http://www.pdftron.com/pdfnet/samplecode.html#ElementEdit).
This technique is more invasive to the document and shouldn’t be used
unless you really need to edit content on a PDF page.

Q: It would be fine for us to force the fonts to be TrueType. It looks
like this code example would embed the entire font definition. It
looks like the API supports subsetting fonts… can you show me an
example of how you would do this such that you ended up with only the
subset of the embedded version of the font that is in use in the
document…


A: For font sub-setting to work, PDFNet must track the font and
associated glyph instances that are referenced in the file. This can
be achieved using one of Font::Create? methods with ‘embed’ and
‘subset’ flags set to true and using ElementBuilder/ElementWriter to
write associated text. In your case all text exists in the document
even before the font is created. To inform PDFNet which glyph
instances should be retained you could create a temporary page
(pdfdoc.PageCreate(), but not added to the page sequence - e.g. using
doc.PageInsert()) and output all text runs that are using this font
(the page will be completely discared during file Save() operation) .
Instead of directly inserting font stream into a font descriptor you
would replace the entire font definition:

for (PageIterator itr = doc.GetPageIterator(); itr.HasNext(); itr.Next
())
{
Page pg = itr.Current();
Obj res = pg.GetResources();
if (res == null) continue;
Obj fonts = res.FindObj(“Font”);
if (fonts == null) continue;
for (DictIterator i = fonts.GetDictIterator(); i.HasNext(); i.Next
()) {
Font font = new Font(itr.Value());
if (font.GetType() != pdftron.PDF.Font.Type.e_TrueType) continue;
string fname = font.GetName();
if (fname != “MyFont”) continue;
// … we encountered ‘MyFont’, so embed the font…

// … possibly check for a custom tag or similar

Font new_font = Font.CreateTrueType(“my.ttf”, true, true);
pdfdoc.GetSDFDoc().Swap(new_font.GetSDFObj().GetObjNum(),
font.GetSDFObj().GetObjNum());
}
}

So, you can achieve font sub-setting on existing PDF files with
missing fonts, but it is a bit more work.

Q: In general, the fonts are installed in the operating system. Do you
know how in .NET to get the file? You suggest in your file loading the
font from a local resource, which I guess is accomplishable, but would
require us to both install the font and distribute it with our
application.


A: There is probably a better way to enumerate system font files
in .NET, but some of PDFNet users used the following workaround:

System.Drawing.Font gdifont = new System.Drawing.Font("Time New Roman
", 1, eStyle);
pdftron.PDF.Font f = pdftron.PDF.Font.CreateTrueTypeFont(doc, gdifont,
true, false),
Obj stm = f.GetDescriptor().FindObj(“FontFile2”);

… copy stm in the descriptor for the missing font…
‘f’ will be descared when the file is saved…
fd.Put(“FontFile2”, stm);

Q: I attempted to do what you described (in C# for .NET 2005 -
attached). I’m trying to embed the specified font, OCRAExtended
font, but subset the part of the font that we’re actually using. This
program should run and manipulate the ExampleScanline.pdf into
Output.pdf (in the bin/Debug subfolder of the solution).


A: The problem is that pdfdoc.GetSDFDoc().Swap() is conflicting with
Font caching. To work around it you could close, then re-open the
document, but a more elegant solution is to simply change font
reference using fonts.Put(fontid, new_font.GetSDFObj()); This is shown
in the attached code sample.

using System;
using System.Text;
using pdftron;
using pdftron.PDF;
using pdftron.SDF;
using pdftron.PDF.PDFA;

namespace TestEmbedFont
{
class Program
{
const string INPUT_FILE = @“ExampleScanline.pdf”;
const string FONT_FILE_NAME = @“OCRAEXT.TTF”;
const string OUTPUT_FILE_NAME = @“Output.pdf”; // where should
the file be written. Ends up in the bin\Debug subfolder when run in
VS2005

static void Main(string[] args)
{

// Initialize PDFNet
PDFNet.Initialize();

// Load the source PDF document
PDFDoc pdfDoc = new PDFDoc(INPUT_FILE);
pdfDoc.InitSecurityHandler();
pdftron.PDF.Font new_font =
pdftron.PDF.Font.CreateTrueTypeFont(pdfDoc, FONT_FILE_NAME, true,
true);

// Loop through the pages
for (PageIterator itr = pdfDoc.GetPageIterator();
itr.HasNext(); itr.Next())
{
// Find the font dictionary and switch the font
Page pg = itr.Current();
System.Diagnostics.Debug.WriteLine(“Begin page
processing.”);
Obj res = pg.GetResourceDict();

if (res == null) continue;
Obj fonts = res.FindObj(“Font”);
if (fonts == null) continue;

for (DictIterator i = fonts.GetDictIterator();
i.HasNext(); i.Next())
{

pdftron.PDF.Font font = new pdftron.PDF.Font
(i.Value());

if (font.GetType() !=
pdftron.PDF.Font.Type.e_TrueType) continue;
string fname = font.GetName();

System.Diagnostics.Debug.WriteLine(“Begin font
processing (” + fname + “).”);
if (fname != “OCRAExtended”) continue;

// … we encountered ‘MyFont’, so embed the
font…
// … possibly check for a custom tag or similar

fonts.Put(i.Key().GetName(), new_font.GetSDFObj
());
}

// Now create a temporary page and write every element
from the source page onto the target page
Page tempPage = pdfDoc.PageCreate();

ElementReader reader = new ElementReader();
ElementWriter writer = new ElementWriter();
writer.Begin(tempPage);
reader.Begin(pg);
Element element = null;
while ((element = reader.Next()) != null)
writer.WriteElement(element);
reader.End();
writer.End();
}

// Save the edited PDF
pdfDoc.Save(OUTPUT_FILE_NAME,
SDFDoc.SaveOptions.e_remove_unused);

// Finish up
pdftron.PDFNet.Terminate();

// Sample code to play with the PDF compliance piece
// PDFNet.Initialize();
// PDFACompliance compliance = new PDFACompliance(true,
INPUT_FILE, null, PDFACompliance.Conformance.e_Level1B, null, 1000,
false);
// compliance.SaveAs(OUTPUT_FILE_NAME, true);
// PDFNet.Terminate();
}
}
}

Note that simply looping over the pages may not find all fonts embedded in the document. To completely unembed fonts, one must traverse over the Xref table. For example:

SDFDoc cos_doc = doc.GetSDFDoc();
int num_objs = cos_doc.XRefSize();
for (int i = 1; i < num_objs; ++i)
{
Obj obj = cos_doc.GetObj(i);
if (obj != null && !obj.IsFree() && obj.IsDict())
{
DictIterator itr = obj.Find(“Type”);
if (!itr.HasNext() ||
!itr.Value().IsName() ||
itr.Value().GetName() != “Font”) continue;
pdftron.PDF.Font font = new pdftron.PDF.Font(obj);
if (font.GetSDFObj() == null ||
!font.IsEmbedded() ||
font.GetType() != pdftron.PDF.Font.Type.e_TrueType) continue;
Obj fd = font.GetDescriptor();
if (fd.FindObj(“FontFile2”) != null)
{
fd.Erase(“FontFile2”);
}
else if (fd.FindObj(“FontFile3”) != null)
{
fd.Erase(“FontFile3”);
}
}
}