Q: We need to be able to extract the text of the page, a PNG representation, a SVG representation and a PDF representation of every page in a given PDF document using Ruby.
I see lots of sample code for various scenarios using Ruby, but the specific PDF2SVG and PDF2Image sections don’t seem to cover Ruby samples.
Also, they need this to read in a file from Amazon S3 and process it in memory. Will that even be possible using code from your PDFDocMemory sample? Perhaps something like this:
PDFNet.Initialize
Read a PDF document in a memory buffer.
file = StdFile.new((url_to_document_on_amazon), StdFile::E_read_mode)
file_sz = file.FileSize
file_reader = FilterReader.new(file)
mem = file_reader.Read(file_sz)
doc = PDFDoc.new(mem, file_sz)
doc.InitSecurityHandler
A:
Read a PDF document in a memory buffer.
file = StdFile.new((url_to_document_on_amazon), >StdFile::E_read_mode)
This most likely won’t work for downloading the data from an online source. At this point you will need to use a Ruby specific API to download the document into the memory buffer, and then use the buffer to create a PDFDoc. One way to implement this is as follows:
-
Use a technique similar to PDFDocMemoryTest sample (http://www.pdftron.com/pdfnet/samplecode/PDFDocMemoryTest.rb) to create a PDFDoc from a memory buffer.
-
Call Convert::ToSVG on the document to convert to PDFDoc to SVG. (there is a somewhat less simple sample in http://www.pdftron.com/pdfnet/samplecode/ConvertTest.rb)
-
Use PDFDraw as in the PDFDraw sample (http://www.pdftron.com/pdfnet/samplecode/PDFDrawTest.rb) to create PNG files for each page. (iterate through the pages as in example 2, but omit the “JPEG” and encoder_param arguments to output PNG)