How do i extract hyperlinks from a pdf

I have pdfs that have internal links for other pages or external resources, How do i deal with them and extract those hyperlinks ?

The easiest way would be to use the ContentReplacer class to replace the text underneath the hyperlink annotation.

You can iterate through the annotations to find the one you want, and then pass the Rect from that Annotation to the ContentReplacer class.

This is exactly what i tried and i was able to find out the annotations and modify them but i have no clue how to get the text that is embedded for the annotation which is visible on the pdf and even to modify that text as the contentReplacer Documentation clearly states that it only replaces the text which are enclosed in “[” and “]”
So i want to understand whether PDFTron

  1. actually gives the visible text for the enclosed annotation ?

  2. Can i update the visible text for annotation ?

  3. Else can i update any text which is not enclosed with “[” and “]” ?

I updated my StackOverflow post.

The AddText method is the one you want to use, as it takes in a Rect defining the area.

It is the AddString method that uses the delimiters like “[” and “]” to find and replace, which as you pointed out would not work.

What about extracting the visible text from the annotation ? is that even possible ? with PDFTron ?

Yes of course. Using the Annotation itself, you can use the TextExtractor class.