Tweaking PDFTron HTML to PDF Conversion to use only local resources

Q: We are using PDFNet with HTML to PDF Conversion module (http://www.pdftron.com/pdfnet/samplecode.html#Html2Pdf) in our application.

When converting HTML to PDF for local HTML files we need to ensure that the HTML2PDF module does NOT attempt to download images/css from the web url. That is it only loads images/css available locally on the file system. How can we do this? - preferably without resorting to modifying the html? There is WebPageSettings.SetLoadImages(…) method, however that stops all images from being displayed in the pdf.

A:

There is no explicit way to do this, but you could use the following workaround. The trick is to set error handling to ignore, and then set a bad proxy. This way any network requests will silently fail.

I tested this by saving the complete www apple ca main page (I used Chrome to do this), and in the saved local webpage is a navigation.css file that tries to load images from a url “/global/nav/images/globalnav_text.png”, which doesn’t work. I added www apple com/ca to the beginning of the url, and converting the locally saved html loaded the remote images. Then trying the trick above, and converting again, the images did not download this time, and the output was the same as it was with the original bad relative url.

Is this still the case? We have the same issue. In our case we do not want to load any external content in to the html, including local file system. We use the api to convert html uploaded by our clients. Allowing our services to download any content referenced by the html is a major security flaw.

Yes, blocking the network connections would be the surest way of preventing a security issue in your system.