When using TextExtractor can we find out effective clipping?

Lee · July 7, 2016, 7:54pm

When processing page as elements, we can discover clipping paths applied to a group, which we know should clip other text elements within.

My question is when using the text extractor, should we count on it reflecting any clipping that may be in effect perhaps in the TextExtractor.Line.GetBBox ? If not, is there a way I get find out the clipping that should be applied to a give TextExtractor.Line, or TextExtractor.Word ? If GetBBox reflects the effect of clipping I can use text extractor to get what I need. Otherwise it forces me to process elements.

In our client’s PDF, the text elements within a clipped group are wider than the presentation space allowed, so apparently the PDF author (3rd party) is using a group clipping feature to keep adjacent columns of text from overwriting each other. Thus all the text is in text elements, but we have to truncate on the right when we render with our proprietary render engine. We know how to clip in our render engine, but we are not sure how to get the clipping from the PDF page, especially when using text extractor.