How can I obtain the contents of Alt Text and logical structure using PDFNet SDK?

Aaron_Gravesdale · November 3, 2010, 10:09pm

Q: I have a PDF that was created by initially creating the document in
Powerpoint 2007 and then saving as a PDF. There was an image in the
original PPT document with some Alt Text attached to it. After saving
it to PDF, you can hover over the image, and the Alt Text will render
as a tooltip. How would I best obtain the contents of that Alt Text
using PDFNet SDK?
---------------
A: In this case the file is using logical structure to indicate that
the image is a 'Figure' and to associate alternate text. To extract
this info you can use high-level logical structure API in PDFNet as
shown in LogicalStructure sample project (http://www.pdftron.com/
pdfnet/samplecode.html#LogicalStructure).

Given a 'pdftron.PDF.Struct.SElement' you can check is there is
associated alternate text using selement.HasAlt() and obtain the
string value using selement.GetAlt().

You can alternatively use ElementReader and process elements with the
following types: e_marked_content_begin, e_marked_content_end,
e_marked_content_point. Depending on your requirements you may still
need to use logical structure API.