JBIG2 Compression Issue

Ivanho · June 7, 2012, 9:22pm

Q:

We have run into an issue with a specific file when it is compressed with PDFNet using JBIG compression. With this file, it is changing a character from н to и . Below is a screenshot to show the difference, as it is not obvious when just looking at the document. This was tested with the latest build you had sent us (5.8.1.0) on May 30 regarding a different issue.

I have attached the source PDF (after_ocr_no_compression.pdf), as well as the output PDF with the character change. I am using code similar to your JBIG2 sample.

Q:

The problem is due to lossy compression. To improve compression JBIG2 builds a symbol dictionary – if two glyph bitmaps are similar (e.g. couple of pixels difference) they will be assigned a same symbol. Unfortunately in this case this leads to a semantics error.

A possible way to deal with the issue is to adjust the ‘Threshold’ JBIG2 encoder hint parameter:

[/JBIG2 /Threshold 0.6 /SharePages 50] - Compress a monochrome image using lossy JBIG2Decode

compression with the given image threshold and by sharing segments from a specified number

of pages. The threshold is a floating point number in the rage from 0.4 to 0.9. Increasing the threshold

value will decrease the loss of image quality, but may increase the file size. The default value

for threshold is 0.85. “SharePages” parameter can be used to specify the maximum number of

pages sharing a common ‘JBIG2Globals’ segment stream. Increasing the value of this parameter

improves compression ratio at the expense of memory usage.

For example:

Array hint; // A hint to image encoder to use JBIG2 compression

hint.PushBackName(“JBIG2”);

hint.PushBackName(“Threshold”);

hint.PushBackNumber(0.9);

hint.PushBackName(“SharePages”);

hint.PushBackNumber(10);

produces output that is indistingusable from the input PDF.