filenames with special characters

My environment : OSX, C++, g++, libPDFNetC.dylib.6.5.3

While evaluating PDFNet SDK i tried to open a PDF document having a special character (german umlaut "ü") in its filename:

    pdftron::PDF::PDFDoc doc("süppchen.pdf");

The string containing the filename is UTF-8 encoded ("ü" correctly represented as "\xb3\xbc").
Although the file exists the ctor fails with an exception:

Exception:
   Message: File does not exist.
   Conditional expression: fs::exists(some_path) && !fs::is_directory(some_path)
   Filename : MappedFile.cpp
   Function : Init
   Linenumber : 63

Using lldb and a breakpoint on syscall "stat" i found out that the string passed to "stat" has been UTF-8 encoded a second time (!): i.e. the UTF-8 sequence "\xb3\xbc" i passed to the ctor of PDFDoc became "\xc3\x83\xc2\xbc". it is no surprise that stat fails with this invalid filename.

I was experimenting a bit and found that if i encode the filename using CP1252 (using iconv()) then the document will be opened successfully.

As the locale is set to UTF-8 it should be possible to use UTF-8 encoded filenames!? shouldn't it?

thx for your help.

If your string is already encoded to UTF8, then please use the following code to open.

PDFDoc doc(UString("süppchen.pdf", -1, UString::e_utf8));

@ryan: many thanks for the hint with UString!

just out of curiosity: as my locale is set to
setlocale(LC_ALL, "en_US.UTF-8");
which encoding other than UTF-8 does PDFNet API assume for char* types?

From your original post your path string was hard coded in the parameter, in which case the compiler might be encoding it differently then expected at compilation. Though you probably posted that for demonstration.

Regardless, we are most likely going to phase out functions that accept strings using char*, as it is error prone, and instead rely on UString.