Stamping pdf as background image

Spencer_Rathbun · May 8, 2012, 3:05pm

Q:

We have a one page pdf which we wish to use as a background image on each page of a pdf that we are creating. We could use the PDF::Stamper class with SetAsBackground true. However, as part of the pdf creation process we are using ElementWriter to place our content, similar to the ElementBuilder example.

How would we duplicate the stamper step during the element writer processing so that we do not have to process the file twice?

Spencer Rathbun

L & D Mail Masters, Inc.

110 Security Pkwy

New Albany, IN 47150

812.981.7161 X.171

Fax: 812.981.7169

www.ldmailmasters.com

Ryan · May 8, 2012, 5:50pm

Hi Spencer,

When you call ElementWriter.Begin(page) there is an optional second parameter for which you can use to write underneath existing content. So instead, call writer.Begin(page, ElementWriter.e_underlay). I’ve attached a modified AddImage sample that shows this in action.

Regards

AddImageTest.py (4.99 KB)

Spencer_Rathbun · May 10, 2012, 5:53pm

Thanks for the assistance Ryan,

you pointed me in the right direction. I’ve bumped into an error though.

I’m setting up a reader on my background pdf, and then a writer to my new page. But, I get an error when I call writer.WriteElement() which says that Objects cannot belong to different documents. Here are the relevant code snippets:

def addBackground(reader, writer):

‘’‘Add all the elements on the background page underneath the current page.’’’

element = reader.Next() # Read page contents

while element != None:

writer.WriteElement(element)

element = reader.Next()

‘’‘Open the background pdf and get it in memory’’’

bp_doc = PDFDoc(‘background.pdf’)

itr = bp_doc.GetPageIterator()

backgroundPage = itr.Current()

start looping over imported pages

imported_pages = new_doc.ImportPages(copy_pages)

i = iter(imported_pages)

for x in i:

while looping over current pages

writer = ElementWriter()

reader = ElementReader()

reader.Begin(backgroundPage)

writer.Begin(x, ElementWriter.e_underlay, False)

addBackground(reader, writer)

writer.End()

reader.End()

new_doc.PagePushBack(x)

It seems to me I could solve this by importing the background page into the new_doc, but I don’t want to add it by itself anywhere.

Thanks for the help,

Spencer Rathbun

Ryan · May 10, 2012, 7:25pm

You need to import the background page into the new target document. PDFDoc.ImportPages doesn’t actually add the page, it simply gathers the resources the page(s) require (fonts, images, etc) and copies them over to the new document. When you save the new document with e_remove_unused option, any imported resource not used, will be removed. See sample 6 in the PDFPageTest sample for an example of this. As long as you don’t call PDFDoc.PagePush[Back|Front] the imported background page will not appear.

Ryan · May 10, 2012, 11:42pm

Q)

All right,

I tried that:

‘’‘Import necessary page resources’’’

resource_pages = VectorPage()

resource_pages.push_back(backgroundPage)

resource_pages.push_back(backOfStatementPage)

imported_pages = new_doc.ImportPages(copy_pages)

res_pages = new_doc.ImportPages(resource_pages)

And I get a crash:

Problem Event Name: APPCRASH

Application Name: python.exe

Application Version: 0.0.0.0

Application Timestamp: 4cf14060

Fault Module Name: PDFNetC.dll

Fault Module Version: 5.7.0.0

Fault Module Timestamp: 4df6ac02

Exception Code: c0000005

Exception Offset: 003af2da

OS Version: 6.1.7601.2.1.0.256.48

Locale ID: 1033

Additional Information 1: 0a9e

Additional Information 2: 0a9e372d3b4ad19135b953a78882e789

Additional Information 3: 0a9e

Additional Information 4: 0a9e372d3b4ad19135b953a78882e789

It seems to not be fond of two imports of separate VectorPage resources. So I added the two background pages to the copy_pages object, and skipped past them before I iterate over it.

copy_pages.push_back(backgroundPage)

copy_pages.push_back(backOfStatementPage)

…

imported_pages = new_doc.ImportPages(copy_pages)

i = iter(imported_pages)

i.next()

for x in i:

…

This causes a different crash:

Problem Event Name: APPCRASH

Application Name: python.exe

Application Version: 0.0.0.0

Application Timestamp: 4cf14060

Fault Module Name: gdiplus.dll

Fault Module Version: 6.1.7601.17825

Fault Module Timestamp: 4f9235ab

Exception Code: c0000005

Exception Offset: 000a0001

OS Version: 6.1.7601.2.1.0.256.48

Locale ID: 1033

Additional Information 1: 0a9e

Additional Information 2: 0a9e372d3b4ad19135b953a78882e789

Additional Information 3: 0a9e

Additional Information 4: 0a9e372d3b4ad19135b953a78882e789

Here is the complete function, to make it clearer what is going on:

def printHp(infile, accounts, workOrder, outfile):

“”“Print the different statement groups to postscript.”""

logger.info("_______________________________________________")

logger.info(“Printing pages for HP…”)

checksOnPage = False

in_doc = PDFDoc(infile)

in_doc.InitSecurityHandler()

(groupings, names) = getGroupings(accounts)

buildFirstLast(“firstLastReport.txt”, names)

iterAccounts = iter(sorted(accounts.iteritems(), key=lambda x: x[1][0]))

output_path = “.”

‘’‘Open the background pdf and get it in memory’’’

bp_doc = PDFDoc(‘background.pdf’)

itr = bp_doc.GetPageIterator()

backgroundPage = itr.Current()

‘’‘Open the back of the pdf and get it in memory’’’

bp_doc = PDFDoc(‘backOfStatement.pdf’)

itr = bp_doc.GetPageIterator()

backOfStatementPage = itr.Current()

‘’‘Import necessary page resources’’’

for group in groupings:

logger.info(“pages {0}-{1}”.format(group[1], group[2]))

new_doc = PDFDoc()

copy_pages = VectorPage()

copy_pages.push_back(backgroundPage)

copy_pages.push_back(backOfStatementPage)

itr = in_doc.GetPageIterator(group[1])

while itr.HasNext():

if itr.Current().GetIndex() > group[2]:

break

page = itr.Current()

copy_pages.push_back(page)

itr.Next()

imported_pages = new_doc.ImportPages(copy_pages)

i = iter(imported_pages)

i.next()

for x in i:

try:

currPage = currPages.next()

logger.debug(“Current page: {0}”.format(currPage))

if currPage in item[1][3]:

checksOnPage = True

except (StopIteration, UnboundLocalError):

item = iterAccounts.next()

currPages = iter(item[1][1])

currPage = currPages.next()

logger.debug(“Current page: {0}”.format(currPage))

if currPage in item[1][3]:

checksOnPage = True

‘’‘letterhead is used for standard pages, but check pages are printed on plain’’’

if checksOnPage:

checksOnPage = False

‘’‘push back the current page’’’

new_doc.PagePushBack(x)

‘’‘create a blank page for the back and push it’’’

blankPage = new_doc.PageCreate()

new_doc.PagePushBack(blankPage)

else:

‘’‘add the background to the current page and push it’’’

writer = ElementWriter()

reader = ElementReader()

reader.Begin(backgroundPage)

writer.Begin(x, ElementWriter.e_underlay, False)

addBackground(reader, writer)

writer.End()

reader.End()

new_doc.PagePushBack(x)

‘’‘push back the back of statement’’’

new_doc.PagePushBack(backOfStatementPage)

if checkRemainingDriveSpace(os.path.abspath(output_path)) < 4000:

output_path = ‘’

for drive in getDrives():

if checkRemainingDriveSpace(drive) > 4000:

if not os.path.exists(os.path.join(drive, ‘temp’)):

os.mkdir(os.path.join(drive, ‘temp’))

output_path = os.path.join(drive, ‘temp’)

logger.info(“Current Drive has less than 4000 megabytes of free space remaining. Switching to {0}”.format(output_path))

break

if not output_path:

logger.info(“All mapped drives have less than 4000 megabytes of space!”)

logger.info(“Quitting on creation of file: {0}”.format(output_filename))

sys.exit(1)

basename = “{0}{1}{2}_{3}”.format(workOrder, group[0], group[1], group[2])

new_doc.Save("{0}.pdf".format(os.path.join(output_path, basename)), 0)

Close the open document to free up document memory sooner than waiting for the

garbage collector

new_doc.Close()

in_doc.Close()

Progress, but still not functional.

A) First, yes, all the pages used in a single call to PDFDoc.ImportPages must be from the same document (this is in the documentation, and an exception also thrown in this case).

Secondly, and I think this is the first error you reported in the last message, the ElementReader is still using the original background page from the original document, ElementReader should be using the first page from “imported_pages” variable.

However, instead of all this you can just use the Stamper class. I’ve attached code that does what you want, importing a background page. The benefit of this is that the Stamper class not only handles all the element reading and writing for you, but it will also auto import the page resources if it’s from another document. Finally, Stamper provides a lot of handy features for customization.

If you’re using a PDFNet version without Stamper, then making the two changes mentioned should fix your issues.

PDFPageTest.py (1.69 KB)

Spencer_Rathbun · May 11, 2012, 1:21pm

Ok, I missed that in the docs. Thanks for pointing that out.

Before I get into my next problem though, I think I need to clear up what I’m trying to accomplish.

I previously used the stamper class for placing barcodes on each page. Unfortunately, this was very, very slow because I only stamped one page per call to the stamper. That was because each page had a different barcode image, so I couldn’t run the stamper over a page range. I also noticed while I was testing that it does not handle non-contiguous page ranges too well, though far better than singleton pages. Using raw element reader/writer combos gives me more control and allows me to do things to the current page on the fly, which is what I’m aiming for.

What the code snippet I gave you does is take all the pages from the original document (over 50,000) and import sets of them into new docs. So I create a VectorPage object that holds pages 1-1000 and create a new file, then 1001-2500 and so on. While I’m doing this to the new docs, I want to put a background onto specific pages that I’m importing. Not all of them, and there is no guarantee which pages it will be. I’d like to do this during the pagepushback loop instead of adding them all, and then calling the stamper with the page list.

But wait, there’s more! While I’m adding these pages, I also need to insert additional pages. The pages that don’t get a background get a new blank page inserted after them, while the pages with a background get a background page, that needs to be imported from a separate pdf. I could do that using the pageInsert, but since I need it multiple times, I’d like to import once, save a pointer to it, then pushback each time I need it.

I may be able to live with pageInsert for the extra page insert, and the stamper for the backgrounds, but I would prefer not to if possible. So, since I can’t import vector pages from different docs I’ve done the following:

‘’‘Open the background pdf and get it in memory’’’

bp_doc = PDFDoc(‘background.pdf’)

itr = bp_doc.GetPageIterator()

backgroundPage = itr.Current()

‘’‘Open the back of the pdf and get it in memory’’’

bp_doc = PDFDoc(‘backOfStatement.pdf’)

itr = bp_doc.GetPageIterator()

backOfStatementPage = itr.Current()

‘’‘Import necessary page resources’’’

bp_vector = VectorPage()

bp_vector.push_back(backgroundPage)

bs_vector = VectorPage()

bs_vector.push_back(backOfStatementPage)

for group in groupings:

logger.info(“pages {0}-{1}”.format(group[1], group[2]))

new_doc = PDFDoc()

copy_pages = VectorPage()

itr = in_doc.GetPageIterator(group[1])

while itr.HasNext():

if itr.Current().GetIndex() > group[2]:

break

page = itr.Current()

copy_pages.push_back(page)

itr.Next()

print “test 1”

imported_pages = new_doc.ImportPages(copy_pages)

print “test 2”

bp_pages = new_doc.ImportPages(bp_vector)

print “test 3”

bs_pages = new_doc.ImportPages(bs_vector)

print “test 4”

It successfully gets to the second import pages, as I receive test 2 on the cmd line. However, I get a stack trace for the second ImportPages command.

Problem Event Name: APPCRASH

Application Name: python.exe

Application Version: 0.0.0.0

Application Timestamp: 4cf14060

Fault Module Name: StackHash_0a9e

Fault Module Version: 0.0.0.0

Fault Module Timestamp: 00000000

Exception Code: c0000005

Exception Offset: 00000000

OS Version: 6.1.7601.2.1.0.256.48

Locale ID: 1033

Additional Information 1: 0a9e

Additional Information 2: 0a9e372d3b4ad19135b953a78882e789

Additional Information 3: 0a9e

Additional Information 4: 0a9e372d3b4ad19135b953a78882e789

This is a different module, StackHash, than my previous errors. Is it impossible to use ImportPages twice on the same document?

Thanks for all the answers and the code snippets,

Spencer Rathbun