Streaming PDF documents.

Aaron_Gravesdale · July 2, 2008, 7:32pm

Q: I am researching PDFTron (PDFNet SDK) for use within my company. I
am curious about some of the functionality of PDFTron however. It
appears that PDFTron can do highlighting within a PDF and then store
it to memory. I am curious if PDFTron has any support directly related
to streaming PDF documents. What we want is to be able to highlight a
PDF based on user search terms and then store it to memory for
streaming. Can PDFTron do all three, or just highlighting and storing
to memory?
------
A: Because saving a web-optimized PDF document requires random access
during serialization it is not possible to 'stream' PDF document while
it is created. One could implement such functionality by saving the
PDF in memory and then adding the 'streaming' interface to read
serialized data, however there would be no real technical advantages
of this approach.

PDF format does allow for streaming during document download (i.e.
during PDF reading). A typically use case would be to allow the user
to read initial pages of a very large document without forcing the
user to download the entire document. This type of document streaming
is achieved through a process called 'linearization'. To enable
linearization using PDFNet, simply specify
'SDFDoc.SaveOptions.e_linearized' flag in the call to
PDFDoc.Save(...).

Aaron_Gravesdale · July 2, 2008, 9:23pm

Q: I guess the main issue is not necessarily that we want to stream
the PDF documents. We were under the impression that we could protect
the PDF document from un-authorized viewing by streaming it. As it is
now, the user would probably be able to enter the link of the PDF
document and gain access to it when they are not supposed to. Do you
know of any way that PDFTron can secure against this kind of thing?
-----
A: You could encrypt all PDF files with a custom security (e.g. using
AES, RC4, etc).
If you are developing under .NET you can use standard encryption
functions from 'System.Security.Cryptography' (e.g.
http://www.codeproject.com/KB/security/SimpleEncryption.aspx). If you
are developing under Java you can use similar classes from security
package.

In short, you would encrypt all PDF documents on the server using a
private key. These documents would not be readable in any PDF viewer
(other than your own). After the encrypted document is downloaded
(e.g. in a memory buffer), you would decrypt the file into a memory
buffer (using 'System.Security.Cryptography' or similar) and pass the
result in PDFDoc constructor.

Using PDFNet it is also possible to implement custom security handlers
(based on the standard PDF security handler). Although this apprach
has some advantages, many of our clients found that for web-based
solutions the above approach is more secure and easier to implement.