We have run into an issue with a specific file when it is compressed with PDFNet using JBIG compression. With this file, it is changing a character from н to и . Below is a screenshot to show the difference, as it is not obvious when just looking at the document. This was tested with the latest build you had sent us (184.108.40.206) on May 30 regarding a different issue.
I have attached the source PDF (after_ocr_no_compression.pdf), as well as the output PDF with the character change. I am using code similar to your JBIG2 sample.
The problem is due to lossy compression. To improve compression JBIG2 builds a symbol dictionary – if two glyph bitmaps are similar (e.g. couple of pixels difference) they will be assigned a same symbol. Unfortunately in this case this leads to a semantics error.
A possible way to deal with the issue is to adjust the ‘Threshold’ JBIG2 encoder hint parameter:
[/JBIG2 /Threshold 0.6 /SharePages 50] - Compress a monochrome image using lossy JBIG2Decode
compression with the given image threshold and by sharing segments from a specified number
of pages. The threshold is a floating point number in the rage from 0.4 to 0.9. Increasing the threshold
value will decrease the loss of image quality, but may increase the file size. The default value
for threshold is 0.85. “SharePages” parameter can be used to specify the maximum number of
pages sharing a common ‘JBIG2Globals’ segment stream. Increasing the value of this parameter
improves compression ratio at the expense of memory usage.
Array hint; // A hint to image encoder to use JBIG2 compression
produces output that is indistingusable from the input PDF.