I have pdfs that have internal links for other pages or external resources, How do i deal with them and extract those hyperlinks ?
The easiest way would be to use the ContentReplacer class to replace the text underneath the hyperlink annotation.
https://www.pdftron.com/documentation/samples/#contentreplacer
You can iterate through the annotations to find the one you want, and then pass the Rect from that Annotation to the ContentReplacer class.
This is exactly what i tried and i was able to find out the annotations and modify them but i have no clue how to get the text that is embedded for the annotation which is visible on the pdf and even to modify that text as the contentReplacer https://www.pdftron.com/api/PDFNet/html/M_pdftron_PDF_ContentReplacer_AddText.htm Documentation clearly states that it only replaces the text which are enclosed in “[” and “]”
So i want to understand whether PDFTron
-
actually gives the visible text for the enclosed annotation ?
-
Can i update the visible text for annotation ?
-
Else can i update any text which is not enclosed with “[” and “]” ?
I updated my StackOverflow post.
The AddText method is the one you want to use, as it takes in a Rect defining the area.
It is the AddString method that uses the delimiters like “[” and “]” to find and replace, which as you pointed out would not work.
https://www.pdftron.com/api/PDFNet/html/M_pdftron_PDF_ContentReplacer_AddText.htm
https://www.pdftron.com/api/PDFNet/html/M_pdftron_PDF_ContentReplacer_AddString.htm
What about extracting the visible text from the annotation ? is that even possible ? with PDFTron ?
Yes of course. Using the Annotation itself, you can use the TextExtractor class.
TextExtractor.GetTextUnderAnnot(annot)
https://www.pdftron.com/api/PDFTronSDK/dotnet/pdftron.PDF.TextExtractor.html#pdftron_PDF_TextExtractor_GetTextUnderAnnot_pdftron_PDF_Annot_