Reduce XFDF file size

Hello.

I save annotations in XFDF file:

     PDFDoc doc = mPdfView.getDoc();
     FDFDoc fdf = doc.fdfExtract(PDFDoc.e_annots_only);
     fdf.saveAsXFDF(annotationsPath);

It's ok, but in case when a pdf document is very large (1000 paged for example), the xfdf file contains unnecessary information for each page:

<link page="2" rect="36.000000,657.000000,558.000000,673.000000">
  <OnActivation>
    <Action Trigger="U">
      <GoTo>
        <Dest>
          <XYZ Zoom="" Top="727" Left="92" Page="16">
          </XYZ>
        </Dest>
      </GoTo>
    </Action>
  </OnActivation>
</link>

I read XFDF spec, but didn't understand the role of these lines.

As a result, the XFDF file is very large, and when I try to load annotations from this file I get OutOfMemoryError.

Is there a way to reduce the file size using pdfnet sdk?

This xfdf data means that each page has an annotation, that when clicked, takes the user to another page. In this case from page 3 to page 17 (xfdf is zero based page numbering).

You could preprocess the PDF and remove the link annotations.

There would need to be a lot of annotations for a 32bit process to run out of memory, but possible.

Is it possible for you to keep the annotations in FDF format (which is binary and much smaller)?

Thanks for response.

I made some investigation:
pdf - 6.5 mb
fdf - 1.0 mb
xfdf - 3.8 mb
xfdf without ‘line’ annot - 0.03 mb

fdf looks better, but still the file size too big.

So, I decided write xfdf manually (or use own format):

for (PageIterator itr = doc.getPageIterator(); itr.hasNext(); ) {
    Page page = (Page) (itr.next());
    int num_annots = page.getNumAnnots();
    for (int i = 0; i < num_annots; ++i) {
        Annot annot = page.getAnnot(i);
        if (!annot.isValid()) {
            continue;
        }
        Obj sdf = annot.getSDFObj();
        String subtype = sdf.get("Subtype").value().getName();
        if (!subtype.equals("Link")) {
            // TODO convert Annot to xfdf
        }
    }
}

Is there utility class that can help me convert Annot to xfdf format like this?

<highlight color="#FFFF00" opacity="1" creationdate="D:20160212092604Z00'00'" flags="print" date="D:20160212092604Z00'00'" page="0" coords="256.035004,524.870002,374.325004,524.870002,256.035004,
498.530002,374.325004,498.530002" rect="256.035004,498.530002,374.325004,524.870002" title="">
</highlight>

I didn’t find documentation that describes how I can extract required fields from Annot class for each type of annotation (I need only highlight, strikeout, underline, ink, text)

пʼятниця, 12 лютого 2016 р. 03:26:57 UTC+2 користувач Ryan написав:

This xfdf data means that each page has an annotation, that when clicked, takes the user to another page. In this case from page 3 to page 17 (xfdf is zero based page numbering).

You could preprocess the PDF and remove the link annotations.

There would need to be a lot of annotations for a 32bit process to run out of memory, but possible.

Is it possible for you to keep the annotations in FDF format (which is binary and much smaller)?

This sample code shows how to parse annotations, in particular
https://www.pdftron.com/pdfnet/samplecode/AnnotationTest.java.html