Reduce XFDF file size

Mysochenko_Yuriy · February 10, 2016, 4:15pm

Hello.

I save annotations in XFDF file:

     PDFDoc doc = mPdfView.getDoc();
     FDFDoc fdf = doc.fdfExtract(PDFDoc.e_annots_only);
     fdf.saveAsXFDF(annotationsPath);

It's ok, but in case when a pdf document is very large (1000 paged for example), the xfdf file contains unnecessary information for each page:

I read XFDF spec, but didn't understand the role of these lines.

As a result, the XFDF file is very large, and when I try to load annotations from this file I get OutOfMemoryError.

Is there a way to reduce the file size using pdfnet sdk?

Ryan · February 12, 2016, 1:26am

This xfdf data means that each page has an annotation, that when clicked, takes the user to another page. In this case from page 3 to page 17 (xfdf is zero based page numbering).

You could preprocess the PDF and remove the link annotations.

There would need to be a lot of annotations for a 32bit process to run out of memory, but possible.

Is it possible for you to keep the annotations in FDF format (which is binary and much smaller)?

Mysochenko_Yuriy · February 12, 2016, 9:53am

Thanks for response.

I made some investigation:
pdf - 6.5 mb
fdf - 1.0 mb
xfdf - 3.8 mb
xfdf without ‘line’ annot - 0.03 mb

fdf looks better, but still the file size too big.

So, I decided write xfdf manually (or use own format):

for (PageIterator itr = doc.getPageIterator(); itr.hasNext(); ) {
    Page page = (Page) (itr.next());
    int num_annots = page.getNumAnnots();
    for (int i = 0; i < num_annots; ++i) {
        Annot annot = page.getAnnot(i);
        if (!annot.isValid()) {
            continue;
        }
        Obj sdf = annot.getSDFObj();
        String subtype = sdf.get("Subtype").value().getName();
        if (!subtype.equals("Link")) {
            // TODO convert Annot to xfdf
        }
    }
}

Is there utility class that can help me convert Annot to xfdf format like this?

<highlight color="#FFFF00" opacity="1" creationdate="D:20160212092604Z00'00'" flags="print" date="D:20160212092604Z00'00'" page="0" coords="256.035004,524.870002,374.325004,524.870002,256.035004,
498.530002,374.325004,498.530002" rect="256.035004,498.530002,374.325004,524.870002" title="">
</highlight>

I didn’t find documentation that describes how I can extract required fields from Annot class for each type of annotation (I need only highlight, strikeout, underline, ink, text)

пʼятниця, 12 лютого 2016 р. 03:26:57 UTC+2 користувач Ryan написав:

This xfdf data means that each page has an annotation, that when clicked, takes the user to another page. In this case from page 3 to page 17 (xfdf is zero based page numbering).

You could preprocess the PDF and remove the link annotations.

There would need to be a lot of annotations for a 32bit process to run out of memory, but possible.

Is it possible for you to keep the annotations in FDF format (which is binary and much smaller)?

Ryan · February 13, 2016, 12:33am

This sample code shows how to parse annotations, in particular
https://www.pdftron.com/pdfnet/samplecode/AnnotationTest.java.html