What is the most efficient way to remove PDF pages?

Renchen_Sun · August 5, 2016, 1:03am

Q: I need to remove a lot of pages in a PDF document. What is the best/efficient way to do it?

I found this example in the sample code:

// Sample 3 - Delete every second page
try
{
    Console.WriteLine("_______________________________________________");
    Console.WriteLine("Sample 3 - Delete every second page...");
    Console.WriteLine("Opening the input pdf...");

    using (PDFDoc in_doc = new PDFDoc(input_path +  "newsletter.pdf"))
    {
        in_doc.InitSecurityHandler();

        int page_num = in_doc.GetPageCount();
        PageIterator itr;
        while (page_num>=1)
        {
            itr = in_doc.GetPageIterator(page_num);
            in_doc.PageRemove(itr);
            page_num -= 2;
        }       

        in_doc.Save(output_path +  "newsletter_page_remove.pdf", 0);
    }
    Console.WriteLine("Done. Result saved in newsletter_page_remove.pdf");
}
catch(Exception e)
{
    Console.WriteLine("Exception caught:\n{0}", e);
}

In this code example, a PageIterator is obtained every time when a page is about to be deleted. I can imagine the complexity of the function GetPageIterator is linear. That will be extremely slow for PDF documents that have many pages. What is the most efficient way to remove PDF pages?

A:

Although GetPageIterator has O(n) complexity, the best way to guarantee the efficiency is by creating a new PDFDoc and import pages that you want to keep by using ImportPages. Note that the source and destination PDFDoc cannot be the same. Please refer to this code sample for its usage. That way the time complexity is always linear. If you want to manipulate the original document, you can erase the Pages dictionary in using the SDFDoc API and copy the Pages dictionary over the new doc. The following code snippet shows you how to do that:

doc.GetRoot().Erase("Pages"); // Erase the pages dictionary
Obj pages = doc.GetSDFDoc().ImportObj(newdoc.GetRoot().FindObj("Pages"), true); // Import Pages dictionary from the new doc that holds all pages that you don't want to delete
doc.GetRoot().Put("Pages", pages); // Create a new Pages dictionary.