Saving/Reading PDF/A into memory buffer

I’ve been trying to use the Python wrapper of PDFTron SDK to convert a “normal” PDF to PDF/A format.
The problem is that on my system I can only open/write files with a File Buffer and not provide the file path.

Is there a way to read a file buffer into PDFACompliance and save the result back to a PDF without providing the file path?

I can successfully save the file created by PDFACompliance like this:

# pdf_a is a PDFACompliance object 
t = pdf_a.SaveAs(False)
with open(pdffile, 'wb') as f:
    f.write(t)

but I cannot figure out how to use PDFACompliance to READ a byte object. From the documentation (PDFTron PDFNet API Reference C++ API) it seems that the method also accepts byte-like obejct but if I try to do this

with open(pdffile,'rb') as f:
    f.seek(0,2)
    file_sz = f.tell()
with open(pdffile,'rb') as f:
    b = bytearray(f.read())

pdf_a = PDFACompliance(True, b, file_sz,
                       None, PDFACompliance.e_Level2B, 0, 0, 10, False)

I get a Type error. I tried all the possible combinations with different parameters but cannot understand how to read directly the buffer.

Is there a way to read a file buffer into PDFACompliance

Yes, this is possible. Though looks like a cast is required. You can see in our PDFDocMemory Python sample you can create from memory.

In particular, this line of code
doc = PDFDoc(bytearray(mem), file_sz)
so for PDFACompliance try casting to bytearray like this
pdf_a = PDFACompliance(True, bytearray(b), file_sz, None, PDFACompliance.e_Level2B, 0, 0, 10, False)

and save the result back to a PDF without providing the file path?

As for saving out the converted PDF/A output to buffer, no that is not currently possible, but certainly could be, we would have to make an update to our API, the work involved I am unsure about at this time.

Are you just starting your evaluation of our SDK?
Can you save to disk in the meantime to continue your evaluation?
What is your deadline for your evaluation/proof-of-concept?

Thanks for the reply.

As I already wrote

with open(pdffile,'rb') as f:
    b = bytearray(f.read())

This is already a bytearray version of the PDF document, so doing bytearray(b) will not do anything besides giving you the same object. For the same reason it does not work, you can try it yourself

TypeError: Wrong number or type of arguments for overloaded function 'new_PDFACompliance'.
  Possible C/C++ prototypes are:
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *,int,int,bool)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *,int,int)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *,int)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &,char const *)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,pdftron::UString const &)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *,int,int,bool)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *,int,int)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *,int)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance,pdftron::PDF::PDFA::PDFACompliance::ErrorCode *)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t,char const *,pdftron::PDF::PDFA::PDFACompliance::Conformance)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t,char const *)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(bool,char const *,size_t)
    pdftron::PDF::PDFA::PDFACompliance::PDFACompliance(TRN_PDFACompliance)

For the write part as a matter of fact it is indeed possible, as I also wrote in the post when you create pdf_a then you can use SaveAs with the binary

t = pdf_a.SaveAs(False)
with open(pdffile, 'wb') as f:
    f.write(t)

So for the moment the problem is only the read. To be fair I think there are some bugs in the implementation of the PDFACompliance function in the .so as it should accept byte-like objects.

We’re trying to evalute whether the SDK fits our needs. For the moment I’m struggling to see whether it will really improve our workflow.

Thank you for the update, it appears there is an issue with how this API is converted by SWIG. We are looking into it.

In the meantime, what Python version are you targeting?
Also are you using PIP?