How to extract bookmark titles represented using PDF text (i.e. Unicode)?

Aaron_Gravesdale · December 1, 2006, 12:11am

Q: We need to ::GetTitle() where titles are in unicode. How
to extract bookmark titles represented using PDF text (i.e. Unicode)?
Do you have any builtin functions to help convert a SDF:String to a
wchar?
----

A:

I assume that you encountered PDF-s that have bookmarks represented
using 'PDF Text' strings (section 3.8.1 'Text Strings' in PDF
Reference).

To go around this you can use Obj.GetBuffer() to check is the string
starts with Unicode header (0xFE,0xFF). If the buffer starts with
0xFE,0xFF you can directly convert subsequent bytes to Unicode. For
example, the buffer (obtained using Obj.GetBuffer()) containing
following data

0xFE, 0xFF, 0, 'H', 0, 'e', 0, 'l' 0, 'l', 0, 'o', 0, '!'

can be converted to "Hello!" Unicode string, by discarding the two byte
header (0xFE, 0xFF) and converting the rest of the buffer to Unicode
(e.g. using System.Text.UnicodeEncoding.GetString(obj.GetBuffer(), 2,
obj.Size()-2) method).

If the buffer does not start with 0xFE,0xFF you can use obj.GetStr() to
obtain the ASCII string.

I agree that this utility function would be very useful. I talked with
our developers and this feature (Str.GetAsUnicode() or similar) will be
available in the next PDFNet update.