Extract doc info and other metadata properties from PDF documents

Q:

We are using PDFNet SDK in a .NET application to check the properties of PDF documents and we have found several documents for which the document properties appears empty when we check them with Adobe Acrobat(we are using Acrobat 9 and 10) but are not when we check them with our applications.

In our code, we try to get the properties of the PDF document with the following code (doc is a PDFDoc object representing our document):

PDFDocInfo info = doc.GetDocInfo();

ourCustomObject.MetaData.Author = info.GetAuthor();

ourCustomObject.MetaData.Keywords = info.GetKeywords();

ourCustomObject.MetaData.Subject = info.GetSubject();

ourCustomObject.MetaData.Title = info.GetTitle();

Could you give us more information on how to get the same information through PDFNet than through the document properties in Adobe Acrobat?

A:

The most likely issue is that the file contains both PDF doc info and XMP metadata steam. Sometimes file can have one or the another or both (sometimes they are not in sync). For more info please see:

https://groups.google.com/d/topic/pdfnet-sdk/Jm04ped89ig/discussion

Btw. if required you can also extract the XML metadata steam as follows (this is VB L but you could easily translate it to whatever language you need):

Private Function ExtractXMLMetadata() As XElement

Dim bufferSize As Integer = 256

Dim xMeta As XElement = Nothing

Dim finalStr As String = “”

Dim xmpStream As pdftron.SDF.Obj = Me.p_doc.GetRoot().FindObj(“Metadata”)

If (xmpStream IsNot Nothing) Then

Dim oStream As pdftron.Filters.Filter = xmpStream.GetDecodedStream()

Dim oReader As pdftron.Filters.FilterReader = New pdftron.Filters.FilterReader(oStream)

Dim buffer As Byte() = New Byte(bufferSize) {}

While (oReader.Read(buffer) <> 0)

Dim tmpStr As String = System.Text.Encoding.UTF8.GetString(buffer)

finalStr &= tmpStr

buffer = New Byte(bufferSize) {}

End While

End If

If (finalStr <> “”) Then

finalStr &= vbCrLf

Try

Dim xDoc As New System.Xml.XmlDocument()

xDoc.LoadXml(finalStr)

xMeta = XElement.Parse(xDoc.OuterXml)

Catch ex As Exception

End Try

End If

Return xMeta

End Function