Is there a better way to identify tables in PDF document in order to tag them as Tables?

maneesharajaratne · August 8, 2022, 11:33am

Product: PDFNet Windows

Product Version:9.2.0

Hello,
I am trying to identify tables and lists elements in the PDF in order to identify and tag them appropriately to make a logical structural tree for the PDF document. I saw that TextExtractor could be used to extract the text, but not sure how to use it to extract tables. Are there any ways of doing it? Can you provide an example of how to do it?

(P.s) I am trying to make an HTML a PDF/UA compliant PDF using PdfTron. So far I could identify texts and images, tag them accordingly, and add them to the logical tree using ElementReader. I am stuck in the tables and lists content. I am coding using C#.

system · August 8, 2022, 11:33am

Hello, I’m Ron, an automated tech support bot

While you wait for one of our customer support representatives to get back to you, please check out some of these documentation pages:

Guides:

Forums:

kmirsalehi · August 9, 2022, 5:34pm

Hi Maneesha,

You can read our page on document understand that goes over table extraction:

We also have a tool where you can upload a file and the software will find the tables and let you download to get a Tagged PDF:

If they want to learn more about this product fill in this form:

maneesharajaratne · August 10, 2022, 4:36am

Is there a way to do this using PDFTron SDK somehow?

kmirsalehi · August 10, 2022, 5:33pm

You can try our online API endpoint to demo the software in your own project: