Rendering perfromance Problem

Product: PDFNetPython3

Product Version: demo

PDFDraw()/PDFRasterizer() is very slowly!?

I need to render pdf page to image in my project, and now I am looking for some librarys(xpdf, pymupdf, pdftron).
My use case is:

  1. convert page of origin and modified pdfs to image
  2. compare the two images and display only the difference sections in GUI.

test contents:

convert a pdf file which contains 20 pages(size=A4) to image(@216dpi).

test environment:

mac mini(2018)
OS: macOS Big Sur
CPU: Intel Core i3 3.6GHz
memory: 16GB
Harddisk: 128G SSD

test result(time taken: seconds):

PyMupdf: 0.25 s
Xpdf: 0.7 s
pdftron: PDFDraw/PDFRasterizer: 10.5 s (almost 40 times of pymupdf)

Below is my code:

from datetime import datetime
from PDFNetPython3 import *
from PIL import Image

PDFNet.Initialize( "demo:******")
doc = PDFDoc("20P.pdf")
doc.InitSecurityHandler()

# ---------- pdftron draw
draw = PDFDraw()
draw.SetDPI(216)
begin = datetime.now()
for i in range(1, doc.GetPageCount() + 1):
    pg = doc.GetPage(i)
    bm = draw.GetBitmap(pg, PDFDraw.e_rgb)
    #a = Image.frombuffer('RGB', (bm.width, bm.height), bytes(bm.GetBuffer()))
print(datetime.now() - begin)
# a.show()

#
# ---------- pdftron Rasterize
box = Page.e_crop
begin = datetime.now()
for i in range(1, doc.GetPageCount() + 1):
    pg = doc.GetPage(i)
    mtx = pg.GetDefaultMatrix(True, box)
    pg_w = pg.GetPageWidth(box)
    pg_h = pg.GetPageHeight(box)
    # C) Scale matrix from PDF space to buffer space
    dpi = 216
    scale = dpi / 72.0  # PDF space is 72 dpi
    buf_w = int(scale * pg_w)
    buf_h = int(scale * pg_h)
    bytes_per_pixel = 4  # BGRA buffer
    mtx = Matrix2D(scale, 0, 0, scale, 0, 0).Multiply(mtx)
    rast = PDFRasterizer()
    buf = rast.Rasterize(pg, buf_w, buf_h, buf_w * bytes_per_pixel,
                         bytes_per_pixel, False, mtx)
    # a = Image.frombuffer('RGBA', (buf_w, buf_h), bytes(buf))
print(datetime.now() - begin)

code of pymupdf:

from datetime import datetime
import fitz
from PIL import Image

doc = fitz.open("20P.pdf")
ppi = fitz.Matrix(3, 3)  # 72*3=216ppi

begin = datetime.now()
for p in range(doc.page_count):
    pm = doc[p].get_pixmap(matrix=ppi)
    pil = Image.frombuffer('RGB', (pm.width, pm.height), pm.samples)
print(datetime.now() - begin)

code of xpdf:

from pyxpdf import Document
from pyxpdf.xpdf import RawImageOutput
from datetime import datetime

doc = Document("20P.pdf")
imgs = RawImageOutput(doc, mode="RGBA", resolution_x=216, resolution_y=216)
begin = datetime.now()
for i in range(doc.num_pages):
    pil = imgs.get(i)
print(datetime.now() - begin)

I am sorry for my poor English. Did I tested in an incorrect way?
Thanks very much.

Hello, I’m Ron, an automated tech support bot :robot:

While you wait for one of our customer support representatives to get back to you, please check out some of these documentation pages:

Guides:APIs:Forums:

Thank you for contacting us about this. It looks like you are using PDFDraw API’s correctly. Is this file specific, or are you seeing this rate on all the files you are testing out? If it does seem to be file specific, are you able to share a document with us for further testing?

Thank you in advance.

As far as I tested, all pdf files have the same problem.
I tested with a demo key, and a watermark was added to every image after rasterize. -->Maybe the handle of watermark cost a lot of time? I don’t know.
I attached the “20P.pdf” file used in my sample code. I created it by Adobe InDesign, and it only contains page number in each page.
The xpdf and pymupdf library I used are:

  • pip install pyxpdf
  • pip install PyMuPDF

Thanks very much.

20P.pdf

20P.pdf (399.5 KB)

Thank you for your response and apologies for the delay. Based on some testing on my end, the slowdown only seems to be reproducible in MacOS (Windows and Linux ran through the code in less than a second). We will be looking into this on our end. From my testing, I was able to render all the pages in about 8 seconds. What version of the SDK are you currently using?

If you are using MacOS for development/testing, are you able to use the Linux SDK for the time being?

I noticed this too. Im using PDFDraw.Export() rendering each page as PNG. It took 7 secs on a 30-page PDF. Any improvements on this would be awesome.