Adding logical structure / tagging PDF

Ivanho · October 15, 2012, 7:24pm

Q:

We would like to use PDFNet to define logical structure and its corresponding marked content in an existing PDF. It appears that there is support for reading this information (pdftron.PDF.Struct namespace and Element.GetStructMCID), but is there any high-level support for easily adding logical structure and its associated marked content to a PDF?

A:

The structure API includes some methods for creation of logical structure. The main difficult is how to connect the logical structure with the physical representation (i.e. page level Elements).

If you are generating content PDF from scratch, this could be very straightforward with use of Form XObject to group content (i.e. not need to insert BMC/EMC tags etc.

If you are dealing with existing documents or you do not want to use Form XObject marked content feature, you can insert marked content using ‘ElementWriter. WriteString’ as described in some of the following articles:

https://groups.google.com/d/topic/pdfnet-sdk/5zlpz49aAd0/discussion

https://groups.google.com/d/topic/pdfnet-sdk/mVyq1mFj0nI/discussion

It is not very high-level or easy to use, however the API should give you full control over the tagging process.

In an older C/C++ version of PDFNet we have implemented a high-level API for tagging using visual regions (e.g. you would specify a rectangle region(s), then associate the region with tags). This API (\Headers\C\PDF\PDFDoc.h) is currently not exposed by default, but could be enabled based on a custom build request:

DispList DispListCreate(Page page)

DispList.DispListTag(double rects[], string tag, Obj prop_dict, bool intersect_mode, bool reshuffle)

DispList.DispListSave(Page page)

For a bit more info see:

‘Is it possible to convert Untagged PDF to Tagged PDF?’

https://groups.google.com/d/topic/pdfnet-sdk/vWJhSlF_JuQ/discussion