Traversing all fonts on a PDF page

Ivanho · August 13, 2012, 11:56pm

Q:
I am trying to traverse all the fonts in this file as in the example code below

//-For each page

int page_num = 0;

ElementReader reader = new ElementReader();

PageIterator itr;

for (itr = pdfdoc.GetPageIterator(); itr.HasNext(); itr.Next()) {

// Find the font dictionary and switch the font

pdftron.PDF.Page pg = itr.Current();

Obj res = pg.GetResourceDict();

if (res == null) continue;

Obj fonts = res.FindObj(“Font”);

if (fonts == null) continue;

for (DictIterator i = fonts.GetDictIterator(); i.HasNext(); i.Next()) {

pdftron.PDF.Font f = new pdftron.PDF.Font(i.Value());

but res.FindObj(“Font”) always returns null

Am I doing something wrong or is there a bug in the library?

A:

The logic for the code snippet you sent me seems correct. I inspected the document you sent me using our CosEdit application (http://www.pdftron.com/pdfcosedit/index.html), and found that the document is structured in a slightly unusual way, which explains why your code is unable to find any font dictionaries. In the PDF specification, a page can define form XObjects, which are self-contained descriptions of any sequence of graphic objects. In other words, these form XObjects share many of the same attributes of a page, and can be nested within a page in a tree like structure.

In your document, the fonts are not found in the page resource dictionaries, but rather, in the XObjects’ resource dictionaries. To access the font dictionaries in this particular document, you would need to write a recursive implementation which looks into any Form XObject as well as the page dictionary.

In any case, I suggest you take a look at the pdf specification, since there may be additional edge cases where the implementation I outlined would fail. Our CosEdit utility, which I linked above, is also very useful for debugging your code and verifying assumptions you’ve made about the structure of the document.