Removing elements from PDF layers or marked content blocks.

Aaron_Gravesdale · June 5, 2008, 10:08pm

Q: I'm having difficulties implementing pdf layer removal.

What I basicly do is to loop all elements and check in the
MCDictionary if the element is in a certain layer or not. If it is, I
don't do element.write(). So far so good.

However, when I open the final pdf in Acrobat, it comes out blank.

Source code is based on elementedittest.
Do you have any idea why I get problems?
Best regards
Johan

for (int i = 1; i <= num_pages; ++i) {
Page page = doc.GetPage(1);
reader.Begin(page, ctx);

Page new_page = doc.PageCreate();
doc.PagePushBack(new_page);
writer.Begin(new_page);
while ((element = reader.Next()) != null) // Read page contents
{
  if (doc.HasOC())
  {
   Obj oTag = element.GetMCTag();
   if (oTag != null && oTag.GetName() == "OC")
   {
    Obj mc_prop = element.GetMCPropertyDict();
    if (mc_prop != null)
    {
     // or use mc_prop.Find("Key") to search for specific key/value
pairs
      DictIterator itr = mc_prop.Find("Name");
     if (itr != null)
     {
      string strLayerName = itr.Value().GetAsPDFText();
      if (layersToRemove.Contains(strLayerName)) {
        continue;
      }
     }
    }
   }
  }
  writer.WriteElement(element);
}
writer.End();
reader.End();
doc.Save(output_path, 0);
doc.Close();
-----
A: In case you are dealing with PDF files with OCG layers you could
use high-level OCG/Layers API (www.pdftron.com/net/
samplecode.html#PDFLayers) to hide specific layers. This process is
much simpler than rewriting the page.

I didn't have a chance to fully test your code, but I can see couple
of issues:

a) element.GetMCTag() and element.GetMCPropertyDict() should be only
called if the element is of type 'e_marked_content_begin'. You are
calling these methods on every element and are getting incorrect
values.

string remove = "";
while ((element = reader.Next()) != null) // Read page contents
{
  if (element.GetType() == Element.Type. e_marked_content_begin) {
   // Entering a new marked content block...
   Obj oTag = element.GetMCTag();
   if (oTag != null && oTag.GetName() == "OC") {
    Obj mc_prop = element.GetMCPropertyDict();
    if (mc_prop != null) {
     // or use mc_prop.Find("Key") to search for specific key/value
pairs
     Obj layer_name = mc_prop.FindObj("Name");
     if (layer_name != null && layer_name.IsString()) {
      remove = layer_name.GetAsPDFText();
      if (layersToRemove.Contains(remove) == false) {
       // Not a layer to remove. Reset the layer name to 'false'.
       remove = "";
      }
     }
    }
   }
  }
  else if (element.GetType() == Element.Type.e_marked_content_end) {
      remove = ""; // Exiting marked content block.
  }
  else if (remove != "") {
    // --> This element is in the marked content block that should be
removed.
    // Output the transform for the image (or form) being skipped.
    Matrix2D m = element.GetGState().GetTransform();
    writer.WriteString(m.m_a + " " + m.m_b + " " + m.m_c + " " + m.m_d
+ " " + m.m_h + " " + m.m_v + " cm ");
    continue; // --> Now skip this element.
  }

writer.WriteElement(element);
}

Actually the above psudo-code is not entirely correct either (since
marked content blocks may nest), however it will work in most cases
and should give you an idea of how to make a bullet-proof solution.

b) You may want to explicitly copy the transformation matrix for
elements that you skip. This way the transformation matrix for
subsequent elements will not be affected. This is also illustrated in
the above code. For more examples, you may want to search for
"separate PDF page" in this form.