Why is PDFNet throwing 'startxref not found' on some of my PDFs (they seem to open fine in Acrobat)?

Q: I encountered a problem when handling digitally signed PDF
documents (CMS/PKCS #7). I would like to provide functions to preview
and/or print such documents but each time I try to do it I get the
following error:

Exception:
  Message: PDF startxref not found. The file is not a valid PDF
document.

There is no problem to open/print such documents using Adobe Acrobat
Reader
starting from version 7.0.9.
-----
A: The most likely problem is that the application/library you used
to sign PDF is saving corrupt PDF documents (i.e. PDF documents with
incorrect cross reference tables).

To verify if this is the case, you can try to open the document in
Acrobat Professional. Acrobat attempts to dynamically fix these files
during 'file open'. If you have Acrobat Pro you will be prompted to
save the file when closing the document (and the error will most
likely go away).

Using PDFNet you can also make an attempt to fix broken PDF documents
using pdftron::SDF::Doc::FixBrokenDoc(). Please keep in mind that if a
PDF document is corrupt there is no absolute guarantee that PDFNet (or
Acrobat) will be able to fix the problem. So the best option is to fix
the bug in the PDF generator or to switch to another PDF library.

A sample use case for FixBrokenDoc is as follows:

// Assuming C# pseudocode
PDFDoc doc;
try { // try to open the document...
  doc = new PDFDoc(input_file);
  doc.InitSecurityHandler();
  ....
}
catch (Exception e) {
  try { // try to fix the document
    StdFile file = new StdFile(input_file,
StdFile.OpenMode.e_read_mode));
    doc = new PDFDoc(pdftron.SDF.Doc.FixBrokenDoc(file));
  }
  catch (Exception e) {
    // Error: Document rebuild failed.
    return false;
  }
}

Q: Thanks for the quick answer.
The problem is that we only consume PDF documents(invoices) incomming
to our system from external systems. As I mentioned before these PDFs
are digitally signed so even if your solution works we are not allowed
to change even a single bit :wink:

But I will tell you about the simplest test I’ve made: I have opened
such signed PDF i.e. in Notepad and removed the signature data
manually (everything before %PDF and everything after %%EOF) and then
tried to open that document in PDFTron CosEdit (in Acrobat Reader too,
of course). And then it works without any warning/error. In my opinion
it rules out your hypothesis about corrupted PDF (but indeed I have
found one corrupted this way too ;-))

Moreover, we are interested at the moment in the read-only
functionalities offered by PDFNet SDK (Type 4 as I suppose) just to
view and print PDFs without making any ammendments. Even if I would
decide to fix the problems i.e. in memory I’m not sure if this type of
the SDK allows to do that.

I take the liberty of sending you one zipped test pdf document. It is
signed so you can observe the effects of opening it in CosEdit or any
other diagnostic tool you use (hope there is no restrictions on
firewalls for this type of attachments on the way). You can also try
to make the test I made and see the effects. Maybe it can help to
solve the problem.

Looking forward to your opinion and thanks in advance,


A: The PDF file you sent is corrupt according to PDF specification and
Acrobat. To verify this you can open the document in Acrobat
Professional. Acrobat will attempt to dynamically fix the file during

‘file open’. If you have Acrobat Pro you will be prompted to save the

file when closing the document. The saved document will strip away
digital signature data following %%EOF.

PDF Reference states the following in ‘Section C (Implementation
Limits)’ ‘Item 18’:

“Acrobat viewers require only that the %%EOF marker appear somewhere
within the last 1024 bytes of the file.”

Similarly PDFNet will search for startxref within the last kilobyte of
the file. If startxref is not found an exception will be thrown. This
is consistent with Acrobat’s behavior.

Now we could extend the search to a larger data window and this may
appear to work for your application, however it is not a full solution
because another application may append even larger or variable data
block to the PDF document and will face a similar problem.