How do I redact text based on regular expression searches?

Q:

I have a question on the new improvements:

ContentReplacer can search and replace strings on a PDF page with user defined patterns.

With this change, will it be possible to perform a search and replace on a wildcard string [i.e., a social security number or phone number pattern – nnn-nnn-nnnn or (nnn) nnn-nnnn]? We often have to redact out this type of personally identifiable information (PII) from documents.

A:

ContentReplacer is the wrong tool for this task. It can’t match regular expressions, and is not intended for redaction.

A better solution would be to perform a text search to find the bounding boxes of text matching a regular expression, as shown in the TextSearch sample code:

http://www.pdftron.com/pdfnet/samplecode.html#TextSearch

Then, to correctly redact the text, use the PDF Redactor add-on. The following sample code shows how:

http://www.pdftron.com/pdfnet/samplecode.html#PDFRedact