Find the file size of a PDF page without saving to disk



I’m having a problem with a particular PDF that is ~425MB in size, and when a single page is retrieved using (Page page = pdfDocument.getPage(1):wink: and then added to a blank document (newPDF.pagePushBack(page):wink: the new PDF containing a single page is almost the same size as the 30 page original. I know this isn’t a problem with your software, but rather how the PDF was constructed (all other tools I tried provide the same result - each page that is extracted is ~ 400MB).

My question is, when you have the PDFTRON:Page in memory is there a way to find the size of the object/stream without putting the page to a PDFDoc and saving it to the file system?


To keep memory under control PDFNet is (by default) storing temp pages on disk even when creating new document. As a result it is not possible to find the exact file size in advance (without serialization). However, you can calculate an estimate by traversing all streams referenced directly or indirectly from the page dictionary (page.GetSDFObj()) and adding up their size (stm_obj.Size()). This also includes all images and fonts listed in the resource dictionary (page.GetResources()).

In case you need the exact value without disk access, you can also serialize the document in a memory buffer (doc.Save() → byte[]).