TextExtractor to Element

Product: PDFNet

Product Version: 8.1

Please give a brief summary of your issue:
When using TextExtractor object to read Lines then read each Word, is it possible to get the Element(s) object of that Word object? I just don’t know what class methods to call to get to an element from textextractor

Hi Kenneth,

To get a better understanding of your requirements, can I ask why you are looking to find the underlying element in the words that you are retrieving from TextExtracor? What sort of information are you looking to retrieve from the elements?

Thanks. I wanna get access to the marked content tied to that element.

Thank you for your response. In that case, the best way to do this is to use the the ElementReader class to traverse through the elements and find the specific text using the bounding boxes returned from the TextExtractor class.

For reference, you can also take a look at the logical structures sample if you haven’t already.

I was thinking about that. But the challenge is, iterating using the ElementReader class does not match the sequence of text returned by the TextExtractor class. Our use case is we use TextExtractor to read the text, then ideally, read also their marked content following the same sequence returned from the TextExtractor.