How to search and highlight text using PDFNet?

Aaron_Gravesdale · June 6, 2008, 7:03pm

Q: I have a requirement similar to "How to search and highlight text
using PDFNet" (http://groups.google.com/group/pdfnet-sdk/browse_thread/
thread/18fa3d43e99f647d/)?

Unfortunately i am getting errors trying to compile your code on the
page. I was able to compile, execute and adapt quite a bit of your
samples , but my particular requirement is to highlight certain
strings in the pdf and add bookmarks to the pages where this string
exists. The first part of the requirement where i have to highlight a
search string is a major requirement which i am not able to get past.

The following is the location where i am getting the error:

Obj quads = Obj.CreateArray();
-------
A: This code snippet is using some APIs that are deprecated starting
with PDFNet v4. For instructions on how to move old code to the latest
API, please see http://www.pdftron.com/net/pdfnet4_upgrade.txt.

The following is the updated 'PDF highlight' sample:

//---------------------------------------------------
// The following sample illustrates how to programmatic highlight
text.
// The sample is using TextExtractor to extract words and PDFDraw
class to
// rasterize pages with highlight annotations. The sample also saves
modified
// PDF documents that includes highlighted text.
//
// If you are looking for interactive text selection and highlighting,
PDFView
// class already includes built-in tool modes for text search and
highlighting.
// For a concrete example of how to use these functions, please take a
look at
// the latest version of PDFView sample project.
//---------------------------------------------------

using System;
using pdftron;
using pdftron.Common;
using pdftron.Filters;
using pdftron.SDF;
using pdftron.PDF;

namespace TextHighlightTestCS
{
class PDFTextHighligh
{
  // Use PDFNet to generate appearance stream for highlight
annotation.
  static Obj CreateHighlightAppearance(PDFDoc doc, Rect bbox, ColorPt
higlight_color)
  {
   // Create a button appearance stream
------------------------------------
   ElementBuilder build = new ElementBuilder();
   ElementWriter writer = new ElementWriter();
   writer.Begin(doc);

   // Draw background
   Element element = build.CreateRect(bbox.x1 - 2, bbox.y1, bbox.x2 +
2, bbox.y2);
   element.SetPathFill(true);
   element.SetPathStroke(false);
   GState gs = element.GetGState();
   gs.SetFillColorSpace(ColorSpace.CreateDeviceRGB());
   gs.SetFillColor(higlight_color);
   gs.SetBlendMode(GState.BlendMode.e_bl_multiply);
   writer.WriteElement(element);
   Obj stm = writer.End();

build.Dispose();
writer.Dispose();

   // Set the bounding box
   stm.PutRect("BBox", bbox.x1, bbox.y1, bbox.x2, bbox.y2);
   stm.PutName("Subtype", "Form");
   return stm;
  }

  // Create a Highlight Annotation.
  static Annot CreateHighlightAnnot(PDFDoc doc, Rect bbox, ColorPt
highlight_color)
  {
   Annot a = Annot.Create(doc, Annot.Type.e_Highlight, bbox);
   a.SetColor(highlight_color);
   a.SetAppearance(CreateHighlightAppearance(doc, bbox,
highlight_color));

   Obj quads = a.GetSDFObj().PutArray("QuadPoints");
   quads.PushBackNumber(bbox.x1);
   quads.PushBackNumber(bbox.y2);
   quads.PushBackNumber(bbox.x2);
   quads.PushBackNumber(bbox.y2);
   quads.PushBackNumber(bbox.x1);
   quads.PushBackNumber(bbox.y1);
   quads.PushBackNumber(bbox.x2);
   quads.PushBackNumber(bbox.y1);
   return a;
  }

  static void Main(string[] args)
  {
   PDFNet.Initialize();
   PDFNet.SetResourcesPath("../../../../../resources");

   // Relative path to the folder containing test files.
   const string input_path = "../../../../TestFiles/";
   const string output_path = "../../../../TestFiles/Output/";

   try
   {
    PDFDoc doc = new PDFDoc(input_path + "newsletter.pdf");
    doc.InitSecurityHandler();

// Highlight all "Robin" instances in the input document.
ColorPt highlight_color = new ColorPt(1, 1, 0); // Yellow

TextExtractor txt = new TextExtractor(); // Used to extract words
Rect word_bbox = new Rect();

PDFDraw pdfdraw = new PDFDraw(96); // Used to export PDF pages to
bitmap.

    PageIterator itr = doc.GetPageIterator();
    for (; itr.HasNext(); itr.Next())
    {
     Page page = itr.Current();
     txt.Begin(page); // Read the page.

     // Example 2. Extract words one by one.
     TextExtractor.Word word;
     String word_str;
     for (TextExtractor.Line line = txt.GetFirstLine();
line.IsValid(); line=line.GetNextLine())
     {
      for (word=line.GetFirstWord(); word.IsValid();
word=word.GetNextWord())
      {
       word_str = word.GetString().ToUpper(); // For case-insensitive
search.
       if (word_str.StartsWith("ROBIN") ||
word_str.EndsWith("ROBIN"))
       {
        word_bbox = word.GetBBox();
        // Console.WriteLine("{0} \t bbox: {1}, {2}, {3}, {4}\n",
word, word_bbox.x1, word_bbox.y1, word_bbox.x2, word_bbox.y2);
        page.AnnotPushBack(CreateHighlightAnnot(doc, word_bbox,
highlight_color));
       }

}
}

     string outname = string.Format("{0}out{1:d}.jpg", output_path,
itr.GetPageNumber());
     Console.WriteLine(outname);
     pdfdraw.Export(page, outname, "jpg");
    }

pdfdraw.Dispose();
txt.Dispose();

    doc.Save(output_path + "output.pdf",
SDFDoc.SaveOptions.e_linearized);
    doc.Close();
    Console.WriteLine("Done.");
   }
   catch (PDFNetException e)
   {
    Console.WriteLine(e.Message);
   }
  }
}
}

http://pdfnet-sdk.googlegroups.com/web/HighlightPDFText.cs

Dillon_Gregory · June 26, 2008, 7:39pm

We are noticing that images compressed using JBIG2 are only readable by
Adobe Reader/Acrobat 8 (via Web plugin). Is there a way to make the
compression capable with Acrobat 7 and above?

Greg Dillon

Aaron_Gravesdale · June 26, 2008, 9:50pm

JBIG2 compression is available starting with PDF 1.4 (i.e. Acrobat 5)
and it does not require a special plug-in. PDFNet SDK can also
compress & decompress embedded JBIG2 images.

Aaron_Gravesdale · November 15, 2011, 8:08pm

The above link is no longer active. The updated link is:
http://www.pdftron.com/pdfnet/samplecode/data/HighlightPDFText.cs