Extracting tha actual font name from PDF font dictionary

Q: I’m using PDFTron C++, and I’ve noticed that a large number of PDFs use non-standard font names when describing standard fonts such as Times New Roman. For example, when I create a PDF with Microsoft Word 2007 using the font Times New Roman and make it bold the Font.GetName function returns “Times,Bold”. The functions GetFamilyName and GetEmbeddedFontName tend to either return the same result, or nothing usable at all. Since the name of the font doesn’t match any font listed as installed on the system, our application is forced to fall back onto a default font. PDFs from other applications often have different notations for the same font, and I’m not aware of any pattern or naming convention they follow.

Does PDFTron have any utility for mapping these non-standard font names to a readily usable font name?

A:

Unfortunately PDF creators frequently name fonts in bizarre ways and finding a matching font could be tricky. If the creator decides to name ‘Lucida Console’ font as ‘LC’, then font.GetFamilyName() will just give back ‘LC’. Having said this many creators follow a convention of using comma or dash to separate font style (e.g. Bold, Italic, Oblique, …) from the family name.

In addition you should strip away a font substitution prefix ‘+’ that may precede the family name.

The code may look along the following lines:

void GetTypeFaceName(string& fn, string& style)

{

int idx = (int)fn.find_first_of(’+’);

if (idx==6 && fn.size()>7) // remove the font substitution prefix

fn = fn.substr(idx+1);

// extract name for TrueType fonts with extra styles

idx = (int)fn.find_first_of (’,’);

if (idx>0 && idx != WStr::npos) // e.g. Times,Bold

style = fn.substr(idx+1);

else {

idx = (int)fn.find_first_of (’-’);

if (idx>0 && idx != WStr::npos) // e.g. Times-Bold

style = fn.substr(idx+1);

}

}