How do i extract hyperlinks from a pdf

I have pdfs that have internal links for other pages or external resources, How do i deal with them and extract those hyperlinks ?

The easiest way would be to use the ContentReplacer class to replace the text underneath the hyperlink annotation.

https://www.pdftron.com/documentation/samples/#contentreplacer

You can iterate through the annotations to find the one you want, and then pass the Rect from that Annotation to the ContentReplacer class.

https://www.pdftron.com/documentation/samples/#annotation

This is exactly what i tried and i was able to find out the annotations and modify them but i have no clue how to get the text that is embedded for the annotation which is visible on the pdf and even to modify that text as the contentReplacer https://www.pdftron.com/api/PDFNet/html/M_pdftron_PDF_ContentReplacer_AddText.htm Documentation clearly states that it only replaces the text which are enclosed in “[” and “]”
So i want to understand whether PDFTron

  1. actually gives the visible text for the enclosed annotation ?

  2. Can i update the visible text for annotation ?

  3. Else can i update any text which is not enclosed with “[” and “]” ?

I updated my StackOverflow post.

The AddText method is the one you want to use, as it takes in a Rect defining the area.

It is the AddString method that uses the delimiters like “[” and “]” to find and replace, which as you pointed out would not work.

https://www.pdftron.com/api/PDFNet/html/M_pdftron_PDF_ContentReplacer_AddText.htm

https://www.pdftron.com/api/PDFNet/html/M_pdftron_PDF_ContentReplacer_AddString.htm

What about extracting the visible text from the annotation ? is that even possible ? with PDFTron ?

Yes of course. Using the Annotation itself, you can use the TextExtractor class.

TextExtractor.GetTextUnderAnnot(annot)
https://www.pdftron.com/api/PDFTronSDK/dotnet/pdftron.PDF.TextExtractor.html#pdftron_PDF_TextExtractor_GetTextUnderAnnot_pdftron_PDF_Annot_