Better Text Selection to/from clipboard

andyr · August 13, 2019, 8:35pm

I am trying to improve our Pdf text selection method. When I use acrobat viewer and select columnized text the layout is better preserved when pasting into word etc than what I get with GetSelection().GetAsUnicode()…
Is there an example of using GetAsHtml() somewhere? Or any suggestions on preserving some semblance of the original layout?

Thanks.

Ryan · October 1, 2019, 6:19pm

The PDF standard does not define how text is extracted exactly, so each vendor is left to their own design. Some vendors may handle a particular file “better” than others, and vice-versa, but where “better” may be very subjective, and different people may read the same PDF in different reading orders (e.g. magazine/newspaper).

For more advanced column detection please see our PDFGenie tool.
https://www.pdftron.com/pdf-tools/pdf-table-extraction