Q: Is there an easy way to get the current clip rectangle? I found
references on how to set it, but not how to get it.
A: Are you looking for page clip rectangle (page.GetCropBox() etc) or
for the current (element) clip path?
In case you need to determine the clip path for the current element
you would need to track the current clipping path. You can find more
information on this in PDFNet Knowledge Base (http://groups.google.com/
group/pdfnet-sdk/topics ; search for "How are clipping paths handled
by PDFNet"). Since you are interested in the current clip rectangle,
the task is simpler since you only need to store the bounding box for
each clip path (path.GetBBox(clip_box)); to intersect two clipping
rectangles on the clip stack use pdftron.PDF.Rect.IntersectRect(r1,
Q: Thanks, I think this will help. I have done it pretty close to what
article describes, just missing a few bits. Unfortunately it's not an
task of course trying to figure out if a piece of text is actually
to the human reading the PDF. I have considered using OCR and looking
image from PdfDraw and compare that to the text that should be there.
Overkill maybe, but with opacity, shading etc...
A: Regarding the handling of hidden text, you may want to consider
using pdftron.PDF.TextExtractor along with e_no_invisible_text and
'e_remove_hidden_text' flag enables removal of text that is obscured
by images or rectangles, while 'e_no_invisible_text' will strip away
text that uses rendering mode 3.
Although these flags do not guarantee that all invisible text will be
removed, they will produce correct results for a majority of PDF
documents out there. In general, there is no a 'perfect solution' to
the text visibility problem.