Bad UTF16 - leading low surrogate

We get this error while using TextExtractor.GetAsText on some documents.

Is there a safe way to bypass / ignore this problem? We would be happy if the page in question cannot be extracted and just returns an empty string for instance. The best would be to extract what could be extracted.

Here, the program abruptly terminate and we cannot catch it at all:

terminate called after throwing an instance of ‘pdftron::Common::Exception’
what(): Exception:
Message: Bad UTF16 - leading low surrogate
Conditional expression: hiUnit <= 0xDBFF
Version : 7.1.0.74119
Platform : Linux
Architecture : AMD64
Filename : UnicodeUtils.cpp
Function : CodePoint_from_UTF16Nat_Surrogate
Linenumber : 1305

Aborted (core dumped)

This issue should already be resolved.

Please try running our latest stable production build.
http://nightly.pdftron.com.s3.amazonaws.com/stable/2020-04-27/7.1/PDFNetC64_2020-04-27_stable_rev74739.tar.gz

But that’s not yet release on the main website, correct?

https://www.pdftron.com/documentation/linux/download/linux/

What’s the implication of installing a ‘nightly’ build? Shouldn’t we wait you release it on the main website before putting it in production?

No, it is fine. The one on the site is literally one of the nightly builds. We just update the website link when its deemed important, either for new feature releases or if there is an important fix.

We tried with nightly, but still having this error:

terminate called after throwing an instance of 'pdftron::Common::Exception'
  what():  Exception: 
	 Message: Bad UTF16 - leading low surrogate
	 Conditional expression: hiUnit <= 0xDBFF
	 Version      : 7.1.1.74739
	 Platform     : Linux
	 Architecture : AMD64
	 Filename     : UnicodeUtils.cpp
	 Function     : CodePoint_from_UTF16Nat_Surrogate
	 Linenumber   : 1305

We would need access to the file(s) then.

Can you post here, or submit confidentially here: https://www.pdftron.com/form/request/