Replacing text in PDF - missing letters

Aaron_Gravesdale · November 3, 2011, 1:48am

Q: I would like to use PDFNet in a webapp where the user should be able to make minor text changes to PDF file uploaded to website. It’s only the text that needs to be changed, not the color, layout or images etc… pure text.

So, I created the code below to do just that. However, there’s a problem with this. The fonts are embedded in the document, but it seems I can only use letters that are already present in the document; any letter NOT yet present in the document will just be skipped. For instance; I replaced an email address in a test doc with mine (f...@bar.com), but it leaves out some characters (e.g f and c). And I noticed that there are quite a few letters that won’t be printed. Is there any way around this? Is this a problem of the application that created the document and did not include enough of the font for your library to correctly work? I can’t use a new font and embed that, because they use a huge amount of exotic fonts that aren’t installed on just any machine.

using (PDFDoc inputDoc = new PDFDoc(Session[“inputdoc”] as byte[], (Session[“inputdoc”] as byte[]).Length))

{

            int pageCount = inputDoc.GetPageCount(), elementCounter = 0;

            for (int pageCounter = 1; pageCounter <= pageCount; pageCounter++)

            {

                            pdftron.PDF.Page currentPage = inputDoc.GetPage(pageCounter), newPage = inputDoc.PageCreate();

                            using (ElementReader reader = new ElementReader())

                            {

                                            using (ElementWriter writer = new ElementWriter())

                                            {

                                                            Element element;

                                                            inputDoc.PagePushBack(newPage);

                                                            reader.Begin(currentPage);

                                                            writer.Begin(newPage);

                                                            while ((element = reader.Next()) != null)

                                                            {

                                                                            if (element.GetType() == Element.Type.e_text)

                                                                            {

                                                                                            string text = string.IsNullOrEmpty(Request.Form["PDF_" + (++elementCounter).ToString()]) ? string.Empty : Request.Form["PDF_" + elementCounter.ToString()];

                                                                                            byte[] textData = text.Length > 0 ? System.Text.Encoding.ASCII.GetBytes(text) : new byte[0];

                                                                                            element.SetTextData(textData, textData.Length);

                                                                            }

                                                                            writer.WriteElement(element);

                                                            }





                                                            reader.End();

                                                            writer.End();

                                            }

                            }

                            newPage.SetMediaBox(currentPage.GetCropBox());

            }

            for (int pageCounter = 1; pageCounter <= pageCount; pageCounter++) {

                            inputDoc.PageRemove(inputDoc.GetPageIterator(1));

            }



            int outputSize = 0;

            byte[] outputData = null;

            inputDoc.Save(ref outputData, ref outputSize, SDFDoc.SaveOptions.e_remove_unused);

            Response.Clear();

            Response.ContentType = System.Net.Mime.MediaTypeNames.Application.Pdf;

            Response.AddHeader("content-disposition", "attachment;filename=" + "icprint_pdf_test.pdf");

            Response.AddHeader("content-length", outputSize.ToString());

            Response.BinaryWrite(outputData);

            Response.End();

            inputDoc.Close();

}

A: Most likely the problem is that the font is subsetted (i.e. the PDF creator app removed glyphs that are not used). The only way to go around this is to make sure that fonts are fully embedded.

On a potentially relevant note PDFTron has recently implemented a new utility API called ContentReplacer that can be used to search and replace text and images in PDF.

For example:

ContentReplacer replacer = new ContentReplacer();

replacer.Add(“NAME”,“John Smith”);

replacer.Add(“QUALIFICATIONS”,“Philosophy Doctor”);

replacer.Add(“JOB_TITLE”,“Software Developer”);

replacer.Add(“ADDRESS_LINE1”,"#100 123 Software Rd");

replacer.Add(“ADDRESS_LINE2”,“Vancouver, BC”);

replacer.Add(“PHONE_OFFICE”,“604-730-8111”);

replacer.Add(“PHONE_MOBILE”,“604-765-4321”);

replacer.Add(“EMAIL","in...@pdftron.com”);

replacer.Add(“WEBSITE_URL”,“http://www.pdftron.com”);

replacer.Process(doc.GetPage(1));

Would replace the given placeholders on a PDF template with variable text. The ContentReplacer does not use AcroForm and is not limited to static rectangular annotation regions.

A full sample project (C#, C++, VB, etc) can be downloaded from:

http://www.pdftron.com/pdfnet/samplecode/data/PDFContentReplacer.zip

You can download the last unofficial build that includes PDF ContentReplacer using the following link(s):

.NET 1.1-3.5, 32: http://www.pdftron.com/IDR49Z9-B31B/PDFNet.zip

.NET 4, 32: http://www.pdftron.com/IDR49Z9-B31B/PDFNetDotNet4.zip

.NET 4, 64: http://www.pdftron.com/IDR49Z9-B31B/PDFNet64DotNet4.zip