Q: I’m looking for some information about editing advanced metadata in
PDF.
I have a PDF with advanced XMP metadata and I would like to add/modify
or delete this with a VB.Net application.
I use PdfNet SDK 5.0.2.0 with .Net Framework 4.
I tried this code :
For writing metadata XML node :
Private Sub SaveMetadata(ByVal xMetadata As System.Xml.XmlElement)
Dim xMetadataStr As String = xMetadata.OuterXml
Dim metadataByte As Byte() =
System.Text.Encoding.UTF8.GetBytes(xMetadataStr)
Dim xmp_stm As pdftron.SDF.Obj =
Me.p_pdfDoc.doc.CreateIndirectStream(metadataByte)
xmp_stm.PutName("Subtype", "XML")
xmp_stm.PutName("Type", "Metadata")
Me.p_pdfDoc.doc.GetRoot().Erase("Metadata")
Me.p_pdfDoc.doc.GetRoot().Put("Metadata", xmp_stm)
Me.p_pdfDoc.doc.Save("test.pdf",
pdftron.SDF.SDFDoc.SaveOptions.e_linearized +
pdftron.SDF.SDFDoc.SaveOptions.e_remove_unused)
End Sub
For extracting/reading metadata XML node :
Private Function ExtractMetaData() As XElement
Dim bufferSize As Integer = 256
Dim xMeta As XElement = Nothing
Dim finalStr As String = ""
Dim xmpStream As pdftron.SDF.Obj =
Me.p_doc.GetRoot().FindObj("Metadata")
If (xmpStream IsNot Nothing) Then
Dim oStream As pdftron.Filters.Filter =
xmpStream.GetDecodedStream()
Dim oReader As pdftron.Filters.FilterReader = New
pdftron.Filters.FilterReader(oStream)
Dim buffer As Byte() = New Byte(bufferSize) {}
While (oReader.Read(buffer) <> 0)
Dim tmpStr As String =
System.Text.Encoding.UTF8.GetString(buffer)
finalStr &= tmpStr
buffer = New Byte(bufferSize) {}
End While
End If
If (finalStr <> "") Then
finalStr &= vbCrLf
Try
Dim xDoc As New System.Xml.XmlDocument()
xDoc.LoadXml(finalStr)
xMeta = XElement.Parse(xDoc.OuterXml)
Catch ex As Exception
End Try
End If
Return xMeta
End Function
My problem is :
When I use a PDF with advanced metadata, the metadata node is stored
at the end of document like this (pdf is editing in a simple text
editor) :
1683 0 obj
<</Length 3587/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta
xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728,
2009/01/18-15:08:04 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:ModifyDate>2010-12-22T16:25:45+01:00</xmp:ModifyDate>
<xmp:CreateDate>2010-06-10T16:03:48+02:00</xmp:CreateDate>
<xmp:MetadataDate>2010-12-22T16:25:45+01:00</
xmp:MetadataDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">tetete</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Bag/>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:c4978727-4b6e-5b49-9040-ed94427de250</
xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:992fd930-caeb-40e7-a6dc-5de01c7906ba</
xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:clientSiteHref>www.bar.com</pdfx:clientSiteHref>
<pdfx:clientSiteTitle>bar</pdfx:clientSiteTitle>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
My method extract metadata node. I adjust the pdfx region (in VB.Net/
WinForm) with my own metadata like this :
<rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/
pdfx/1.3/">
<pdfx:clientSiteHref>www.tata.com</pdfx:clientSiteHref>
<pdfx:clientSiteTitle>tata</pdfx:clientSiteTitle>
<pdfx:pod>good</pdfx:pod>
</rdf:Description>
I replace the old XML nodewith the new one for having a node like :
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043
52.372728, 2009/01/18-15:08:04 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:ModifyDate>2010-12-22T16:25:45+01:00</xmp:ModifyDate>
<xmp:CreateDate>2010-06-10T16:03:48+02:00</xmp:CreateDate>
<xmp:MetadataDate>2010-12-22T16:25:45+01:00</
xmp:MetadataDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">tetete</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Bag/>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:c4978727-4b6e-5b49-9040-ed94427de250</
xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:992fd930-caeb-40e7-a6dc-5de01c7906ba</
xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:clientSiteHref>www.bar.com</pdfx:clientSiteHref>
<pdfx:clientSiteTitle>bar</pdfx:clientSiteTitle>
<pdfx:pod>good</pdfx:pod>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
I use my writing method and the pdf result give me 2 information, one
object with the modified metadata XML node and one object with the old
metadata information, but with format like this :
4747 0 obj
<</CreationDate (D:20100610160348+02'00')/ModDate (D:
20101222162545+01'00')/Title (Rapport 2009)/clientSiteHref
(www.foobar.com)/clientSiteTitle (toto)>> endobj
If I use the new PDF with my application, I can get the good new
metadata XML node. But when I open the new PDF with Adobe Acrobat Pro,
the metadata are the old version.
-----------------------
A: PDF documents can contain metadata stored in the document
information dictionary as well as XMP (i.e. as Metadata key in the
document catalog). Did you try erasing the old document info
dictionary? For example: doc.GetTrailer().Erase("Info");