How do I enumerate all fonts in PDF?

Q: Is there a way to enumerate all of the embedded fonts in a
document, or do I need to just collect a list of font names from the
text blocks?

Once I know the name of an embedded font, and we are calling
GetGlyphPath(), how do we know what size buffers to allocate for
operators and data?
----------------------
A: There are several options you can use to enumerate fonts in PDF.

You can follow the same pattern used to extract all embedded images
(as shown in ImageExtract sample - http://www.pdftron.com/pdfnet/samplecode.html#ImageExtract).
In case you are traversing low-level object list you can recognize
because they are dictionaries with Type -> Font entry. In case you are
traversing display list of a page, you can access fonts via element's
GState (element.GetGState().GetFont()).

A third approach is to traverse all font resources listed under page
resource dictionary. For example:

Obj res = page.GetResourceDict();
if (res != null) {
   Obj fonts = res.FindObj("Font");
   if (fonts != null) {
      ... now enumerate xobjects in xobjs dictionary ...
      for (DictIterator itr = fonts.GetDictIterator(); itr.HasNext();
itr.Next()) {
           Font font = new Font(itr.Current());
           ....
      }
   }
}

are calling GetGlyphPath(), how do we know what size buffers to
allocate for operators and data?

You can use element.GetPointCount() and element.GetPathTypesCount() to
obtain the number of entries in each array.

Here is the full code to print them out, in VB

Dim itr As PageIterator = doc.GetPageIterator()
While itr.HasNext() ’ Read every page
Console.WriteLine(“Page {0:d} ----------------------------------------”, itr.GetPageNumber())
Dim res As Obj = itr.Current().GetResourceDict()
If Not res Is Nothing Then
Dim fonts As Obj = res.FindObj(“Font”)
If Not fonts Is Nothing Then
Dim ditr As DictIterator = fonts.GetDictIterator()
While ditr.HasNext()
Dim font As Font = New Font(ditr.Value())
Console.WriteLine(font.GetFamilyName())
ditr.Next()
End While
End If
End If
itr.Next()
End While

Here is my code… Here is sample output from the above… I am running against PDFNet 6.6.0.38591. It is not finding any fonts. Can you suggest what may need change please? Thanks for your help - Best regards, Lee Gillie CCP

There might not be any fonts. If you open the PDF in a PDF reader, can you select and copy/paste the text? If not, then there is no actual text.

If there is text, then the issue must be that all the text is in form XObjects. The following code is a more exhaustive search, looking for fonts in XObjects (which in turn can contain nested XObjects)

Dim itr As PageIterator = doc.GetPageIterator()
While itr.HasNext() ' Read every page
Console.WriteLine("Page {0:d} ----------------------------------------", itr.GetPageNumber())
Dim res As Obj = itr.Current().GetResourceDict()
If Not res Is Nothing Then
IterateFonts(res.FindObj("Font"))
IterateFormXObject(res.FindObj("XObject"))
End If
itr.Next()
End While

Sub IterateFonts(fonts As Obj)
If Not fonts Is Nothing Then
Dim ditr As DictIterator = fonts.GetDictIterator()
While ditr.HasNext()
Dim font As Font = New Font(ditr.Value())
Console.WriteLine(font.GetFamilyName())
ditr.Next()
End While
End If

Sub IterateFormXObject(xobjects As Obj)
If Not xobjects Is Nothing Then
Dim ditr As DictIterator = xobjects.GetDictIterator()
While ditr.HasNext()
Dim xobject As Obj = ditr.Value()
Dim resources As Obj = xobject.FindObj("Resources")
if Not resources Is Nothing Then
IterateFonts(resources.FindObj("Font"))
IterateFormXObject(resources.FindObj("XObject"))
End If
ditr.Next()
End While
End If

XObject suggestion picked up the missing fonts - thanks.

But a question still on the same vein. I find the fonts on each page. (see below showing glyph counts as evidenced using iterating fonts for each page and using GetCharCodeIterator).

How do these two from page 1 differ?
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,Bold 39 characters

Or these from page 4?
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName: 33 characters
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial,Italic 33 characters

I think to build a composite list of fonts for the entire document I need to key a dictionary on Name+FamilyName+EmbeddedFontName, and only keep unique occurrences. Each keyed occurrence seen on separate pages seems to refer to the same glyph count. I suspect they return a reference to the same font object? Also, except for embedded font name property differing, some of these almost look like they might be duplicates.

Our ultimate goal is to provide a pick-list for export of True Type Fonts for the entire document. Storing these temporarily aids us in ancillary processes we will do to enhance the output in the printing process.

Page 1 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName:ABCDEE+Times New Roman 8 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName: 49 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName:ABCDEE+Calibri,Bold 49 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName: 45 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName:ABCEEE+Calibri 45 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName: 43 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName:ABCDEE+BakerSignet BT 43 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,Bold 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,BoldItalic 39 characters
Name:ABCDEE+Arial Black, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial Black 16 characters
Page 2 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName:ABCDEE+Times New Roman 8 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName:ABCDEE+BakerSignet BT 43 characters
Page 3 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName:ABCDEE+Times New Roman 8 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName:ABCDEE+Calibri,Bold 49 characters
Name:ABCEEE+Poor Richard, FamilyName:Poor Richard, EmbeddedFontName:ABCEEE+Poor Richard 21 characters
Name:ABCEEE+Yorktown, FamilyName:Yorktown, EmbeddedFontName:ABCEEE+Yorktown 14 characters
Name:ABCEEE+Arial Narrow, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial Narrow 40 characters
Name:ABCEEE+Arial Narrow,Bold, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial Narrow,Bold 35 characters
Name:ABCEEE+Anderson Thunderbirds Are GO!, FamilyName:Anderson Thunderbirds Are GO!, EmbeddedFontName:ABCEEE+Anderson Thunderbirds Are GO! 23 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCEEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName: 8 characters
Name:ABCEEE+DFKai-SB, FamilyName:DFKai-SB, EmbeddedFontName:ABCEEE+DFKai-SB 23 characters
Name:ABCEEE+Blue Highway,Bold, FamilyName:Blue Highway, EmbeddedFontName:ABCEEE+Blue Highway,Bold 36 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName:ABCDEE+BakerSignet BT 43 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,Bold 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName: 39 characters
Page 4 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName: 49 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName:ABCDEE+Calibri,Bold 49 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName: 45 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName:ABCEEE+Calibri 45 characters
Name:ABCEEE+Arial Narrow,Bold, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial Narrow,Bold 35 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCEEE+Calibri,Italic, FamilyName:Calibri, EmbeddedFontName: 56 characters
Name:ABCEEE+Calibri,Italic, FamilyName:Calibri, EmbeddedFontName:ABCEEE+Calibri,Italic 56 characters
Name:ABCEEE+Engravers MT, FamilyName:Engravers MT, EmbeddedFontName:ABCEEE+Engravers MT 13 characters
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName: 33 characters
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial,Italic 33 characters
Name:ABCEEE+Castellar, FamilyName:Castellar, EmbeddedFontName:ABCEEE+Castellar 30 characters
Name:ABCFEE+Gisha,Bold, FamilyName:Gisha, EmbeddedFontName:ABCFEE+Gisha,Bold 49 characters
Name:ABCFEE+Watson, FamilyName:Watson, EmbeddedFontName:ABCFEE+Watson 19 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName: 39 characters

As always, thanks for helping us to understand.

When you download our desktop SDK, there is a tool called COSEdit in it. This allows you to graphically navigate the PDF. The code we provided earlier, contains the structure+keys that you would look for. Though to get to a page you go through /Root/Pages.

There might be duplicates, or they might be different fonts, but with the same FontFile object, so they actually share the same binary glyph data.

You could also send the PDF to support at pdftron.com for review.

Finally, it would help if you explained what your overall objective is.