Advanced diff, simplified example?

jason.saunders · June 15, 2022, 4:14pm

Product: PDFTron / Webviewer

Product Version: 8.5.0

Please give a brief summary of your issue:
How to do advanced diff, or even just how to show multiple viewers.

Please describe your issue and provide steps to reproduce it:
I’m trying to implement the advanced diff similar to your https://www.pdftron.com/samples/web/samples/advanced/diff/ but not finding it very easy and getting a little frustrated with it.

Firstly at https://www.pdftron.com/documentation/web/samples/advanced/#diff-documents the “Source Code” button does not actually show the source code of the demo. It shows something else which calls the appendTextDiffDoc API which I find confusing given that the sample is described as “shows pixel differences between the two documents”. Is appendTextDiffDoc supposed to be for pixel differences or for “text” differences. If its for pixel differences then how does it differ to appendVisualDiff?

Anyway, ignoring the “Source Code” given that it seems to be for something else, trying to work out how the real demo works is not particularly obvious to the uninitiated.

Is it possible to show a simple piece of code that shows 2 viewers being loaded and the references to the documents in those viewers being obtained? Or vice-versa, 2 documents being loaded and then shown in 2 viewers? Either way, such that the documents could then be used with appendVisualDiff for example.

Also, wondering what the syncNamespaces function does? It is done in the demo, but it’s not clear to me what it’s for. However, it looks like it could be important for what I’m trying to do which involves multiple viewers.

Many thanks.

zserviss · June 15, 2022, 10:47pm

Hi Jason,

Thank you for your feedback! We are always looking for feedback regarding our samples and potential guides to help customers get started with features and implementations.

First off here’s the full source code for the demo of diff:

diff.zip (592.7 KB)

Here is the API documentation for appendTextDiffDoc and appendVisualDiff. The main difference between the two is that appendVisualDiff will look for differences in non text elements.

https://www.pdftron.com/api/web/Core.PDFNet.PDFDoc.html#appendTextDiffDoc

https://www.pdftron.com/api/web/Core.PDFNet.PDFDoc.html#appendVisualDiff

As for syncNamespaces, it is used for reusing the same worker files for all the instances of WebViewer so you don’t have to reinstantiate them.

https://www.pdftron.com/api/web/Core.html#.syncNamespaces

Here is an other demo and sample code for PDF comparison:

Best Regards,
Zach Serviss
Web Development Support Engineer
PDFTron Systems, Inc.
www.pdftron.com

jason.saunders · June 16, 2022, 11:56am

diff.zip is what I’m already looking at.

Thanks for the documentation links but I know where the documentation is already

https://www.pdftron.com/webviewer/demo/semantic-text-compare does helps to clear up the difference between appendTextDiffDoc and appendVisualDiff, but also reinforces that the “Source Code” button against https://www.pdftron.com/documentation/web/samples/advanced/#diff-documents is wrong. To reiterate, the sample is described as “shows pixel differences between the two documents”, but the source code uses appendTextDiffDoc which is for semantic text differences.

I’m still not very clear on syncNamespaces, but I will carry on.

zserviss · June 20, 2022, 5:50pm

Great to hear that clears up the differences. Thank you for pointing out the use of appendTextDiffDoc in that example, I’ll look more deeply into that.

As for syncNamespaces is there a specific question that I could help clear up?

Here is an other code example, this time correctly using appendVisualDiff in the sample code.

jason.saunders · June 21, 2022, 10:11am

You said “it is used for reusing the same worker files for all the instances of WebViewer so you don’t have to reinstantiate them.”, but I don’t actively instantiate them and also you make it sound like its a performance or memory saving feature whereas the doc link you supplied makes it sound like a requirement for interoperability.

Is it for cases of sharing objects between instances. e.g. if I got a reference to a doc shown in webviewer A and want to show it directly in webviewer B (without doing anything intermediate like serialising it to a byte array)?

zserviss · June 21, 2022, 5:40pm

Sorry for misleading you on that. I have confirmed with the author that it gets multiple instances of WebViewer and uses the same namespace across those multiple instances.

It doesn’t save memory but it is required for interoperability between different instances so type checks work for the same types.

jason.saunders · June 23, 2022, 11:07am

I must be misunderstanding what you mean by “…so type checks work for the same types.”.

If I do this:

  const doc = await Core.createDocument(url);

  const viewer = await WebViewer({
    path: '/js/libs/pdftron-8.6.0',
    fullAPI: true,
    licenseKey: licenseKey
  }, document.getElementById('viewer'));

  viewer.UI.syncNamespaces({ PDFNet, Core });

  console.log(doc instanceof Core.Document); // true
  console.log(doc instanceof viewer.Core.Document); // false. Why?

Then doc instanceof viewer.Core.Document is false, which is wrong.

If I then went on to try to load doc in the viewer e.g:

viewer.UI.loadDocument(doc);

it fails with error:

Extension of the document cannot be determined from URL/Path. WebViewer will assume the extension is "pdf".

because I assume it equally failed the instanceof test and so thought it was a url.

Have I misunderstood what syncNamespaces does or am I not using it correctly?

Andy_Huang · June 28, 2022, 7:08pm

Hi Jason,

When loading multiple instances of WebViewer, each WebViewer loads its own namespaces into its own iframes. Since they are each loading their own, each separate window will have different instances of objects and classes that may not align with one another despite having the same script.

This causes the instanceof checks to fail when checking an object created from one instance with a loaded class from another instance.

From your code, I see you are passing Core into syncNamespaces which doesn’t take a Core argument as one of the options. From the source, it doesn’t seem to sync document types as well. This is because the Document object is used quite early and by the time you call syncNamespaces, it would be too late. I am not sure about your use case but you should be keeping each document created by a certain instance to its own respective instances. If you need to share workers, you should look to using the workerTransportPromise instead: PDFTron Systems Inc. | Documentation.

jason.saunders · June 29, 2022, 12:04pm

What I’ve been tasked with is to create a UI which shows the pixel differences between two documents and which can toggle between showing just the differences document and showing the two source documents as well. i.e. to toggle between a single webviewer instance and 3 webviewer instances.

What I was trying to do was to not load webviewer instances until I needed them to reduce overhead. So my plan was to load the two documents outside of a webviewer via the Core.createDocument(url). Generate the differences document, and then show that in the single webviewer instance.

Then, if the user chooses to switch to the 3 panel display, I would then create two more webviewer instances and just load the two documents in them, rather than making the webviewers load the original URLs again.

From what you’re saying, it sounds like this isn’t possible? You have to load documents in each webviewer instance independently?

Andy_Huang · June 29, 2022, 7:05pm

I think there are tradeoffs since you have to keep documents in memory. Although the Document objects may not be compatible between each of the instances, you can still get the file data (and perhaps cache that instead) to load in other instances.

I had something like this:

Doc created outside of WebViewer

let mainDoc = null;

(async function() {
  Core.setWorkerPath('/lib/core');
  mainDoc = await Core.createDocument('http://myServer/files/doc.pdf');
})();

Later in WebViewer, I had to wait till the document loaded before actually using the mainDoc object. I just used a timeout here.

    const timeoutMs = 1000;
    const loadExternalDoc = async () => {
      if (!mainDoc) {
        timeout = setTimeout(loadExternalDoc, timeoutMs);
        return;
      }
      const pages = [];
      for (let i = 1; i <= mainDoc.getPageCount(); i++) {
        pages.push(i);
      }
      const xfdfData = await mainDoc.extractXFDF(pages);
      const data = await mainDoc.getFileData(xfdfData);
      const blob = new Blob([data], { type: 'application/pdf' });
      instance.loadDocument(blob, { extension: 'pdf' });
    };

    let timeout = setTimeout(loadExternalDoc, timeoutMs);