Hi,
We developed a tool for text extraction, and it was working very well.
But today we tried to get some text from a PDF document and it's
returning garbage (this also happens if try to copy/paste from the
acrobat reader). It seems the PDF is using a custom encoding. Is there
any other way to retrieve the unicode text?
if ( element.getType() == pdftron.PDF.Element.e_text ) {
GState gstate = element.getGState();
Font font = gstate.getFont();
String tempResult = "";
long char_code = 0;
CharIterator itr = element.getCharIterator();
while( itr.hasNext() ){
CharData data=( CharData )( itr.next() );
char_code = data.getCharCode();
char[] temp = font.mapToUnicode( char_code );
tempResult = tempResult +
String.valueOf( temp );
}
}
Thanks