Editing PDF XMP metadata.

Aaron_Gravesdale · January 6, 2011, 11:30pm

Q: I’m looking for some information about editing advanced metadata in
PDF.

I have a PDF with advanced XMP metadata and I would like to add/modify
or delete this with a VB.Net application.
I use PdfNet SDK 5.0.2.0 with .Net Framework 4.

I tried this code :

For writing metadata XML node :
    Private Sub SaveMetadata(ByVal xMetadata As System.Xml.XmlElement)
        Dim xMetadataStr As String = xMetadata.OuterXml
        Dim metadataByte As Byte() =
System.Text.Encoding.UTF8.GetBytes(xMetadataStr)

        Dim xmp_stm As pdftron.SDF.Obj =
Me.p_pdfDoc.doc.CreateIndirectStream(metadataByte)
        xmp_stm.PutName("Subtype", "XML")
        xmp_stm.PutName("Type", "Metadata")

        Me.p_pdfDoc.doc.GetRoot().Erase("Metadata")
       Me.p_pdfDoc.doc.GetRoot().Put("Metadata", xmp_stm)
Me.p_pdfDoc.doc.Save("test.pdf",
pdftron.SDF.SDFDoc.SaveOptions.e_linearized +
pdftron.SDF.SDFDoc.SaveOptions.e_remove_unused)
    End Sub

For extracting/reading metadata XML node :
Private Function ExtractMetaData() As XElement
Dim bufferSize As Integer = 256
Dim xMeta As XElement = Nothing

        Dim finalStr As String = ""
        Dim xmpStream As pdftron.SDF.Obj =
Me.p_doc.GetRoot().FindObj("Metadata")
        If (xmpStream IsNot Nothing) Then
            Dim oStream As pdftron.Filters.Filter =
xmpStream.GetDecodedStream()
            Dim oReader As pdftron.Filters.FilterReader = New
pdftron.Filters.FilterReader(oStream)

            Dim buffer As Byte() = New Byte(bufferSize) {}
            While (oReader.Read(buffer) <> 0)
                Dim tmpStr As String =
System.Text.Encoding.UTF8.GetString(buffer)
                finalStr &= tmpStr
                buffer = New Byte(bufferSize) {}
            End While
        End If

        If (finalStr <> "") Then
            finalStr &= vbCrLf
            Try
                Dim xDoc As New System.Xml.XmlDocument()
                xDoc.LoadXml(finalStr)
                xMeta = XElement.Parse(xDoc.OuterXml)
            Catch ex As Exception
            End Try
        End If
        Return xMeta
    End Function

My problem is :
When I use a PDF with advanced metadata, the metadata node is stored
at the end of document like this (pdf is editing in a simple text
editor) :

1683 0 obj
<</Length 3587/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta
xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728,
2009/01/18-15:08:04 ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/">
         <xmp:ModifyDate>2010-12-22T16:25:45+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2010-06-10T16:03:48+02:00</xmp:CreateDate>
         <xmp:MetadataDate>2010-12-22T16:25:45+01:00</
xmp:MetadataDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">tetete</rdf:li>
            </rdf:Alt>
         </dc:title>
         <dc:creator>
            <rdf:Bag/>
         </dc:creator>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
         <xmpMM:DocumentID>uuid:c4978727-4b6e-5b49-9040-ed94427de250</
xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:992fd930-caeb-40e7-a6dc-5de01c7906ba</
xmpMM:InstanceID>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <pdfx:clientSiteHref>www.bar.com</pdfx:clientSiteHref>
         <pdfx:clientSiteTitle>bar</pdfx:clientSiteTitle>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj

My method extract metadata node. I adjust the pdfx region (in VB.Net/
WinForm) with my own metadata like this :
      <rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/
pdfx/1.3/">
         <pdfx:clientSiteHref>www.tata.com</pdfx:clientSiteHref>
         <pdfx:clientSiteTitle>tata</pdfx:clientSiteTitle>
         <pdfx:pod>good</pdfx:pod>
      </rdf:Description>

I replace the old XML nodewith the new one for having a node like :
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043
52.372728, 2009/01/18-15:08:04 ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/">
         <xmp:ModifyDate>2010-12-22T16:25:45+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2010-06-10T16:03:48+02:00</xmp:CreateDate>
         <xmp:MetadataDate>2010-12-22T16:25:45+01:00</
xmp:MetadataDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">tetete</rdf:li>
            </rdf:Alt>
         </dc:title>
         <dc:creator>
            <rdf:Bag/>
         </dc:creator>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
         <xmpMM:DocumentID>uuid:c4978727-4b6e-5b49-9040-ed94427de250</
xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:992fd930-caeb-40e7-a6dc-5de01c7906ba</
xmpMM:InstanceID>
      </rdf:Description>
      <rdf:Description rdf:about=""
      xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <pdfx:clientSiteHref>www.bar.com</pdfx:clientSiteHref>
         <pdfx:clientSiteTitle>bar</pdfx:clientSiteTitle>
         <pdfx:pod>good</pdfx:pod>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

I use my writing method and the pdf result give me 2 information, one
object with the modified metadata XML node and one object with the old
metadata information, but with format like this :
4747 0 obj
<</CreationDate (D:20100610160348+02'00')/ModDate (D:
20101222162545+01'00')/Title (Rapport 2009)/clientSiteHref
(www.foobar.com)/clientSiteTitle (toto)>> endobj

If I use the new PDF with my application, I can get the good new
metadata XML node. But when I open the new PDF with Adobe Acrobat Pro,
the metadata are the old version.
-----------------------
A: PDF documents can contain metadata stored in the document
information dictionary as well as XMP (i.e. as Metadata key in the
document catalog). Did you try erasing the old document info
dictionary? For example: doc.GetTrailer().Erase("Info");

ponsharan · June 12, 2014, 12:24pm

Hi,

I am developing tool to import XMP information into PDF file. I have used the following line which throws an UNKNOWN EXCEPTION error while compiling:

Dim xmp_stm As pdftron.SDF.Obj = oInputDoc.CreateIndirectStream(reader)

So, Changed the coding little like this

Dim xmp_stm As pdftron.SDF.Obj = oInputDoc.CreateIndirectStream(reader, New pdftron.Filters.FlateEncode(Nothing))

Still, the issue is not getting resolved. Could you please tell me what went wrong with this line.

Best regards
Sharan

agravesdale · June 12, 2014, 6:28pm

Hello Sharan,

Thank you for letting us know that you’re seeing this behaviour. So that we’re on the same page, could you forward a more complete code sample to support@pdftron.com? Thank you for your help.