Setting a Logical Page for a PDF

Hi Matt, sorry for resuming such an old thread, but the issue is still alive for us.
I successfully tried today to display and retrieve the PageLabel of the first page of the attached PDF using PDFNetJS full build and the getPageLabel method. Printing the result in the dev tools console gave me this (PDF WebViewer version 5.1.0)

m_first_page: -1
m_last_page: -1
mp_obj: “0”
name: “PageLabel”

As you can see in the PDF, the first “logical” page number is 892. In the PageLabel object above there is no such information. In the first link you sent me last time it is stated that WebViewer does not support that yet. Is this still the case? If not, are there any plans to support it in the future? Otherwise we’ll just retrieve the logical page from the server and send it alongside with the url of the pdf and adapt our UI accordingly.
Best regards
Mirko

AJPS_2018032817013332.pdf (775 KB)

Hi Mirko,

We have the instance.setPageLabels API now in the UI https://www.pdftron.com/api/web/WebViewerInstance.html#setPageLabels__anchor
As long as you’re not using the jQuery UI with version 5.1 then you can use that.

You can read the page label information using the full API and then call instance.setPageLabels with the relevant data (the API link has an example) .

Hope this helps.

Anthony Chen
Software Developer
PDFTron Systems Inc.

Hi Anthony
thanx for your reply, although I’m not so sure I understood you correctly, because I think we’re talking at cross purposes here.
The main point is not to set a page label, nor to display infos in the UI.
If I try to retrieve the currently displayed page of the PDF using either of

readerControl.getCurrentPageNumber()
readerControl.docViewer.getCurrentPage()

I get “1” as result if the first “physical” page of the pdf is currently on display and is the active one.
My problem is that in the pdf I had attached, the first page number is 892 (as you can see in the center bottom of the pdf). The “physical” page number is 1, as returned correctly by the functions mentioned above, but I don’t know how (and if it is possible) to retrieve the information about this “logical” page which in this case would be 892.
If I had this info in advance I could use the setPageLabels method as you say, but since I don’t have this info, I’d like to know if the webViewer is able to retrieve this info using vanilla javascript.
Best regards
Mirko

Hi Mirko,

Thanks for clarifying. I must have been on a different page.

There is a way to do it. This is assuming the information you are looking for is almost always on the same place for every page.

We can do it like so:

  • use text extraction on a certain part of the page to get the text you are looking for

Sample code snippet:

await PDFNet.initialize();
const doc = await docViewer.getDocument().getPDFDoc();
const firstPage = await doc.getPage(1);

const txt = await PDFNet.TextExtractor.create();
const rect = new PDFNet.Rect(279, 68, 309, 51);
txt.begin(firstPage, rect); // Read the page.

// Extract words one by one.
let line = await txt.getFirstLine();
for (; (await line.isValid()); line = (await line.getNextLine())) {
for (word = await line.getFirstWord(); (await word.isValid()); word = (await word.getNextWord())) {
console.log(await word.getString());
}
}

Please note that the “Rect” is in PDF coordinates.
Please also note, I only did it for page 1.

How I got those coordinates:
Listen to the document viewer click event and converted those coordinates to PDF coordinates.

Relevant guides used:
https://www.pdftron.com/documentation/web/guides/extraction/text-extract?searchTerm=text%20extraction

https://www.pdftron.com/documentation/web/guides/coordinates/#converting-between-pdf-and-viewer-coordinates

Anthony Chen

Software Developer
PDFTron Systems Inc.

Hi Anthony
thanx for your precious tip, it works. I’m not sure all pdfs are like that but in case it will be then just a matter of adjusting the coordinates.
Best regards
Mirko

Hi Mirko,

Glad to hear your issue was resolved.

“thanx for your precious tip, it works. I’m not sure all pdfs are like that but in case it will be then just a matter of adjusting the coordinates.”

Correct. As long as all those page numbers are in similiar positions for each page for that PDF, this solution should suffice.

Anthony Chen

Software Developer
PDFTron Systems Inc.