How do I apply Batch Redaction with Redactor?

Q:

Will PDFNet SDK with Redactor (https://www.pdftron.com/pdfnet/docs/PDFNetC/d1/d02/classpdftron_1_1_p_d_f_1_1_redactor.html)allowallow) the users to do a Batch Redaction? For example, if someone wanted to redact a certain area for every page (on lets say a 100 page PDF file), can this be done without redacting every page 1 by 1?

A:

If the redacted page dimensions are identical for each page, then redacting the same area on each page is trivial.

Here, for example, is our sample code for performing redactions on various pages of a PDF document:

https://www.pdftron.com/pdfnet/samplecode.html#PDFRedact

To perform the same redaction on each page would be as simple as:

PDFDoc doc((input_path + “newsletter.pdf”).c_str());
doc.InitSecurityHandler();
vectorRedactor::Redaction vec;
int page_num = doc.GetPageCount();
for (int i=1; i<=page_num; ++i)
vec.push_back(Redactor::Redaction(i,
Rect(100, 100, 550, 600), false, “Top Secret”));
Redactor::Appearance app;
app.RedactionOverlay = true;
app.Border = false;
app.ShowRedactedContentRegions = true;
Redact(input_path + “newsletter.pdf”,
output_path + “redacted.pdf”, vec, app);

The redactor also supports concept of ‘negative’ redactions and this may be relevant to your batch processing mode. A document based negative redaction expand beyond the single page to automatically remove content from other pages in the document.

Q:

In the sample
https://www.pdftron.com/pdfnet/samplecode.html#PDFRedact, it looks like we are redacting based on the region? Is there a way to redact based on a text found in the PDF?

A:

You can perform a text search to find the bounding boxes of text matching a regular expression, as shown in the TextSearch sample code:

http://www.pdftron.com/pdfnet/samplecode.html#TextSearch

Once you have the bounding boxes, you can pass those in to Redactor as the coordinates of the redacted region.

Alternatively you could use TextExtractor and do your own search/text processing.

Q:

Does Redactor just place a box over the text OR does it permanently replace the text with the
box?

It is important that the user cannot use any sort of PDF annotation program to remove the redacted box and see the text EVER after the redaction is complete.

A:

The Redactor completely removes the redacted region from the PDF content, including text, vector content, images, and annotations. It does NOT simply add an annotation or image mask over the content. Once PDFNet redacts the content, the content is erased from the document.

Q:

Which sample code would allow the user to highlight the bounding box on a PDF page so that I can pass to the Redactor

A:

For an interactive sample which is using PDFViewCtrl see:
https://groups.google.com/d/msg/pdfnet-sdk/QVbYaIIxWl4/NLQZTanr0bgJ