Search and highlight text in PDF in a server application.

Aaron_Gravesdale · January 17, 2009, 12:01am

Q: I would like to search and highlight PDF based on user input. For
example:

1) do a word/phrase search through an entire pdf. For instance, I
want to know on which pages the phrase "game theory" appears in a
given PDF. I want to show those pages on which the phrase appears on
a web page.

2) once I find the pages that have the phrase "game theory" the user
will choose a page (let's say page 5) on which the phase appears. We
want to now highlight all instances of "game theory" on that page in
yellow, and then:

3) turn that highlighted pdf page into an image and show the image on
the web page

Please let me know if I can use PDFNet (http://www.pdftron.com/net)
for this task.
----
A: PDFNet SDK can be used to implement search and highlight on PDF
documents.

For text search you could use 'pdftron.PDF.TextExtractor' class as
shown in TextExtract sample project (http://www.pdftron.com/net/
samplecode.html#TextExtract). Besides finding specific words
TextExtract will return positioning information (bounding box), style,
and other properties for each word/character. This information can be
used to highlight text

To convert (highlighted) PDF pages to JPEG (or other image formats)
you could use PDFDraw class as shown in PDFDraw sample (http://
www.pdftron.com/net/samplecode.html#PDFDraw).

We have many clients how have implemented this type of solution using
PDFNet. As a starting point you may want to use the sample code
provided in the following article:
C#: http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/4625f9567d1b34be/09b32d05716996fd
JAVA: http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/d078562409fdced1/40fb134228fb0c48
http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/384aa4ccf6f91103/ed3799a06e3edb16

If you are looking for a stand-alone PDF viewer component for
interactive search and highlighting you may want to take a look at
pdftron.PDF.PDFView class (see http://www.pdftron.com/net/samplecode.html#PDFView).
PDFView class has built-in text search and highlighting.

Aaron_Gravesdale · November 15, 2011, 8:10pm

Note: The above article applies to PDFNet prior to v5.
For newer versions, please see
https://groups.google.com/d/topic/pdfnet-sdk/RiX5Vn0bNL4/discussion

The updated source code is:

http://www.pdftron.com/pdfnet/samplecode/data/HighlightPDFText.cs