Convert text in the PDF to vector lines

Q:

I would like to convert text in the PDF to vector lines . Here’s what I’d like to have happen:

  1. For some PDFs, I want to convert the text in the PDF to vector lines (called glyphs).

  2. To do this, use PDFNet, per the code sample in this link: http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/5f2b758d169f595a/99832f25f99957cd?lnk=gst&q=glyph#99832f25f99957cd

  3. I want the new PDFs returned to me with the text converted to glyphs.

I was reviewing your link, but I don’t know what I have to do after this code line

font.GetGlyphPath(itr.Current().char_code, path_oprs, path_data, true, (ref) path_mtx);

can you help me with that, please?

A:

The next steps for text to path conversion are also described in the provided link.

Basically you would extract path data returned from GetGlyphPath() and use it to build a new path using ElementBuilder (e.g. for example using ElementBuilder.CreatePath( double[] points, int point_count, byte[] seg_types, int seg_types_count).

Then you can output the reacted path element instead of the glyph/text run.

We could write custom code for you as part of a consulting service (http://www.pdftron.com/support/professionalservices.html), however we would be happy to assist you if you have a bit more specific API questions/issues.

Q:

Thanks for the quick response. We really appreciate that.

We had already seen those links that you suggested and we developed the code below. The problem that we have is that the output PDF that we are generating is empty. Each text that should be replaced by a glyph is empty.

Just to sum up the code below, let me explain to you what we are doing :

For each element in the document, we get the vectors of path operators and path data like this: font.GetGlyphPath(itr.Current().char_code, ref path_oprs, ref path_data, true);

Then, we iterate through each operator in the vector in order to create an element, using an ElementBuilder. Finally we write this element with the document writer.

Below, I also highlighted the most important lines to show you what I just explained to you.

using pdftron;

using pdftron.PDF;

namespace ConsoleApplication2

{
class Program {

snip

A:

There were a number of issues in the provided code. To help you get on the right track we have resolved many of these and have provided a new sample that show how to convert text to paths. Please note that the program is not complete since the matrix calculation used to place the path (with SetTransform()) is not completely correct. There are also many other pieces missing for a production ready solution such as preserving text color, stroke/fill/clip, testing etc.

using System;

using pdftron;

using pdftron.PDF;

using pdftron.SDF;

using pdftron.Common;

namespace ConsoleApplication2

{

class Program

{

///

/// Pseudocode for PDF text to path conversion

///

static void Main(string[] args)

{

PDFNet.Initialize();

// Relative path to the folder containing test files.

string input_path = “d:\”;

string output_path = “d:\”;

string input_filename = “iPhoneFP.pdf”;

string output_filename = “out.pdf”;

try

{

Console.WriteLine("-------------------------------------------------");

// Open the test file

Console.WriteLine(“Opening the input file…”);

PDFDoc doc = new PDFDoc(input_path + input_filename);

doc.InitSecurityHandler();

int num_pages = doc.GetPageCount();

ElementWriter writer = new ElementWriter();

ElementReader reader = new ElementReader();

ElementBuilder bld = new ElementBuilder();

for (int i = 1; i <= num_pages; ++i)

{

Page page = doc.GetPage(i);

reader.Begin(page);

writer.Begin(page, ElementWriter.WriteMode.e_replacement, false);

ProcessElements(reader, writer, bld);

writer.End();

reader.End();

}

writer.Dispose();

reader.Dispose();

bld.Dispose();

doc.Save(output_path + output_filename, SDFDoc.SaveOptions.e_remove_unused);

doc.Close();

Console.WriteLine(“Done. Result saved in {0}…”, output_filename);

}

catch (PDFNetException e)

{

Console.WriteLine(e.Message);

}

}

static public void ProcessElements(ElementReader reader, ElementWriter writer, ElementBuilder bld)

{

Element element;

while ((element = reader.Next()) != null) // Read page contents

{

switch (element.GetType())

{

case Element.Type.e_path:

break;

case Element.Type.e_text_begin:

case Element.Type.e_text_end:

continue;

case Element.Type.e_text:

{

CharIterator itr = element.GetCharIterator();

GState gs = element.GetGState();

pdftron.PDF.Font font = gs.GetFont();

double font_size = gs.GetFontSize();

double horiz_spacing = gs.GetHorizontalScale() / 100.0;

Matrix2D text_mtx = element.GetTextMatrix();

Matrix2D pos = new Matrix2D(1, 0, 0, 1, 0, 0);

Matrix2D font_mtx = new Matrix2D(font_size * horiz_spacing, 0, 0, font_size, 0, 0);

double units_per_em = font.GetUnitsPerEm();

font_mtx = font_mtx * new Matrix2D(1.0 / units_per_em, 0, 0, 1.0 / units_per_em, 0, 0);

byte[] path_oprs = null;

double[] path_data = null;

for (; itr.HasNext(); itr.Next())

{

if (font.GetGlyphPath(itr.Current().char_code, ref path_oprs, ref path_data, true) && path_data.Length > 0)

{

Element path = bld.CreatePath(path_data, path_data.Length, path_oprs, path_oprs.Length);

pos.m_h = itr.Current().x;

pos.m_v = itr.Current().y;

Matrix2D path_mtx = new Matrix2D(text_mtx);

path_mtx *= pos;

path_mtx *= font_mtx;

path.GetGState().SetTransform(path_mtx);

// TODO Set fill stroke, color space, colorants … based on text GState…

// GState.TextRenderingMode tr = element.GetGState();

path.SetPathFill(true);

path.SetPathStroke(false);

writer.WritePlacedElement(path);

}

}

continue; // skip writing the text element

}

case Element.Type.e_form:

{

reader.FormBegin();

ProcessElements(reader, writer, bld);

reader.End();

break;

}

}

writer.WriteElement(element);

}

}

}

}

On Wednesday, March 14, 2012 2:06:21 PM UTC-7, Support wrote:

Q:

I would like to convert text in the PDF to vector lines . Here’s what I’d like to have happen:

  1. For some PDFs, I want to convert the text in the PDF to vector lines (called glyphs).

  2. To do this, use PDFNet, per the code sample in this link: http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/5f2b758d169f595a/99832f25f99957cd?lnk=gst&q=glyph#99832f25f99957cd

  3. I want the new PDFs returned to me with the text converted to glyphs.

I was reviewing your link, but I don’t know what I have to do after this code line

font.GetGlyphPath(itr.Current().char_code, path_oprs, path_data, true, (ref) path_mtx);

can you help me with that, please?


A:

The next steps for text to path conversion are also described in the provided link.

Basically you would extract path data returned from GetGlyphPath() and use it to build a new path using ElementBuilder (e.g. for example using ElementBuilder.CreatePath( double[] points, int point_count, byte[] seg_types, int seg_types_count).

Then you can output the reacted path element instead of the glyph/text run.

We could write custom code for you as part of a consulting service (http://www.pdftron.com/support/professionalservices.html), however we would be happy to assist you if you have a bit more specific API questions/issues.