Effect of e_text_begin/e_text_end when removing elements?

We are editing pages. Basically trying to remove everything visual within a specified Rect. We draw from your classic sample using Element_Reader and Element_Writer. That is we write all read elements, except for those we are trying to delete, and replace the new element list on the page. We use a trick you showed us to use XSet to avoid processing any XObject more than once in the recursive ProcessElements.

For the most part, this works fine for us, and has for some number of years.

Recently though we are seeing some strange results. I suspect it may be because in one edit, we eliminate an e_text element that is within a single set of e_text_begin/e_text_end elements, such as:

e_text_begin
e_text: first line of interest
e_text_new_line
e_text: second line of interest
e_text_end

When we remove the first line of interest, the second line of interest disappears on us, as is seen by viewing the edited PDF page. My theory is because the both exist within text begin/end pair. Are we doing this correctly? Can the visual appearance of the first line of interest affect the second line of interest, so that if the first line were removed, it could cause issues for the second line?

Another thing to note is that we may find need to make another edit pass over the page, which may result in the second line of interest being removed.

Please note that we have a add-on class called Redactor. You can simply give it a rectangular region and it will remove all content in that area. This is full redaction, and fully removes the content, instead of just hiding it. If you are interested in this feature, please contact our sales department.

Regardless, to remove text, you could just set the TextData to empty, and then still write the element. As you encountered, simply removing elements completely is difficult to get right, as this can have knock on effects, especially for PDF files that don’t conform exactly to the PDF specification.

http://www.pdftron.com/pdfnet/docs/PDFNet/?topic=html/M_pdftron_PDF_Element_SetTextData.htm

element.SetTextData(new byte[0], 0);

Simarlry, for images, you could replace the image data with an empty/transparent image in still write the element.