[PDFNet] How do I detect and fix a corrupt font in PDF?

Aaron_Gravesdale · May 14, 2010, 5:37pm

Q: I have some PDFs with corrupt font. How can I find the name of the
font that is giving this problem?
I would like to replace it with a generic font at least to extract an
image even if its not the right one.
If you don't mind, I appreciate any help.
----------------------
A: You could try to identify the font by iterating through all fonts
stored in the page resource dictionary. The pseudocode may look as
follows:

// JAVA pseudocode

    private int testFonts() {
        try {
            //-For each page
            int page_num = 0;
            for (PageIterator itr = mPDFDocument.getPageIterator();
itr.hasNext() {
                Log.log("\tTesting fonts on page " + page_num);
                Page page = (Page) (itr.next());
                //---
                Obj res = page.getResourceDict();
                Obj fonts = null;
                if (res != null)
                    fonts = res.findObj("Font");
                if (fonts != null) {
                  for (DictIterator i = fonts.getDictIterator();
i.hasNext(); i.next();){
                     if (i.value() == null){
                         continue;
                     }
                     Font f = new Font(i.value());
                     String fontname = f.getName();
                     Log.log("Testing font " + fontname );
                     try{
                        f.getUnitsPerEm(); // this line will actually
load the font.
                     } catch (PDFNetException ex){
                         Log.warning("Error retrieving font " +
fontname + " from page " + page_num);

                     }
                  }
                }
                ++page_num;
            }
        } catch (PDFNetException ex) {
            Log.logException(ex);
            return INTERNAL_ERROR;
        }
        return SUCCESS;
    }

--
You received this message because you are subscribed to the "PDFTron PDFNet SDK" group. To post to this group, send email to support@pdftron.com
To unsubscribe from this group, send email to pdfnet-sdk-unsubscribe@googlegroups.com. For more information, please visit us at http://www.pdftron.com

Aaron_Gravesdale · May 14, 2010, 5:39pm

Q: I have isolated a page that gives this problem and can detect the
font which has a problem, thanks. Is there a way to replace the font
with a generic font (SDK fallback), just to make sure I can proceed
with the image extraction even if the characters do not display
properly?
-----------------
A: You could try to replace the font with another font:

fonts.put(i.key(), pdftron.PDF.Font.create(doc.
pdftron.PDF.Font.StandardType1Font.e_helvetica).getSDFObj());

Not sure if this would always work, but it is worth a try.

--
You received this message because you are subscribed to the "PDFTron PDFNet SDK" group. To post to this group, send email to support@pdftron.com
To unsubscribe from this group, send email to pdfnet-sdk-unsubscribe@googlegroups.com. For more information, please visit us at http://www.pdftron.com