[PDFNet] How to extract the right text from a PDF file that misses unicode mapping

Frank_Liu · April 2, 2012, 5:55pm

Q: The text extracted from the attached PDF file doesn’t display properly. Is there any way to extract it correctly using PDFTron?

A:

Sometimes, a font might miss its Unicode mapping, in which case a character is mapped to the Unicode Private Area (0xE000-0xF8FF) and it is supposed to be understood by a conformant viewer. You can try to subtract 0xE000 from the character’s Unicode and it will normally map to “right” letter.