How do I normalize the font settings for the fields of a document?

Q:

I would like the ability to do the following:

Scan a PDF document.

If a text field is found,

Then Determine the FontSize and FontType

If !FontType = “Times New Roman” And !FontSize = “9”

Set FontType = “Times New Roman”

Set FontSize = “9”, etc.

I would LIKE the ability to do the above but I would settle for the ability to scan a document and just read out the information for the font properties for each text field. My problem is that we have a
Large library of pdf files and I need to write a simple program that will scan the library and tell me if any files have a font other than “Times New Roman”, “9 Point”.

A:

The field’s text settings are stored in its default appearance string, “DA” in the annotation’s dictionary. The appearance string encodes the font settings for the field.

You can find the default appearance string with code similar to the following:

FieldIterator itr = pdfdoc.GetFieldIterator();
for(; itr.HasNext(); itr.Next()) {
Field field = itr.Current();
Console.WriteLine(“Field name: {0}”, field.GetName());
Obj da_obj = field.GetSDFObj().Get(“DA”).Value();
if (da_obj != null)
{
string da = da_obj.GetAsPDFText();
// parse da to find font information
}
}

Once you have the default appearance string, you would need to parse it to find font information. The format is documented in the PDF Reference Manual (http://www.pdftron.com/downloads/PDFReference16.pdf, Table 8.71 in Chapter 8.6.2).

The following are examples of DA strings:

// Set font size to 10 pts. Set text color to black (0, 0, 0).
fld.PutString(“DA”, "/Arial 10 Tf 0 0 0 rg ");

// Auto-size font (font size = 0). Set text color to red (1, 0, 0).
fld.PutString(“DA”, "/Arial 0 Tf 1 0 0 rg ");

// Set font size to 14 pts. Set text color to green (0, 0, 1).
fld.PutString(“DA”, "/Arial 0 Tf 0 0 1 rg ");

// Auto-size font (font size = 0). Set character spacing (Tc) to 0.25.
Set text color to gray (0.5, 0.5, 0.5).
fld.PutString(“DA”, "/Courier 0 Tf 0.25 Tc 0.5 0.5 0.5 rg ");

// Set the explicit text matrix (scale text 0.5 horizontally, 1.5
times vertically, offset by [10,20]).
fld.PutString(“DA”, "/Courier 12 Tf 0.5 0 0 1.7 10 20 Tm ");

The font name typically, but not always, appears as the first element of the string. Because the default appearance string must always contain a Tf operator, you could search the string for a regular expression similar to “/(.*) (\d+) Tf” to find the font name and font type.

You would like to be able to add a default appearance as well. You could do so with field.GetSDFObj().PutString(), creating a default appearance string as described above.