Unexpected characters when trying to create text dynamically

alanoe · May 4, 2022, 11:04pm

Product:
PDFTron SDK

Product Version: 9.1.0

OS: Ubuntu 20.04

Please give a brief summary of your issue:
Unexpected characters when trying to create text dynamically

Please describe your issue and provide steps to reproduce it:
I’m trying to add text dynamically to an existing PDF with a font available in the system. I’m using the following Python code:

page_texts = ["test1", "test2"]
ewriter = ElementWriter()
ebuilder = ElementBuilder()
# not shown here: create PDFDoc object from PDF in disk
page = pdf_doc.GetPage(3)
ewriter.Begin(page)
element = ebuilder.CreateTextBegin(Font.Create(pdf_doc.GetSDFDoc(), "Inter", ""), 11)
ewriter.WriteElement(element)
for i, text in enumerate(page_texts):
    element = ebuilder.CreateTextRun(text)
    element.SetTextMatrix(1, 0, 0, 1, 0, 20*i)
    ewriter.WriteElement(element)
    i += 1
element = ebuilder.CreateTextEnd()
ewriter.WriteElement(element)
ewriter.End()  # save changes to the current page

However, the text shows up in Japanese characters in the saved PDF. Am I loading the font incorrectly by passing its name and an empty string as char_set ? I’ve followed the sample that loads the Helvetica font in PDFTron Systems Inc. | Documentation . I’ve tried loading Helvetica, one of the 14 default fonts in any PDF reader and surely supported by name by PDFTron, and got the same result. I have tried to call PDFNet.AddFontSubst() with the Inter font as parameter before rendering the strings, but I got the same result.

I’m only able to load the font correctly if I provide a Font.StandardType1Font (e.g. Font.e_times_roman) instead of a font name to Font.Create().

Please provide a link to a minimal sample where the issue is reproducible: not available

Ryan · May 6, 2022, 6:27pm

Please see this forum post for code.

Note that you should call CreateUnicodeTextRun not CreateTextRun.

empty string as char_set

You should pass in the text you want to display to char_set, so that our SDK can select a Font that covers the desired unicode ranges.

alanoe · May 6, 2022, 7:36pm

Note that you should call CreateUnicodeTextRun not CreateTextRun .

Even if I’m only using ASCII characters? When CreateTextRun should be used then?

You should pass in the text you want to display to char_set, so that our SDK can select a Font that covers the desired unicode ranges.

Why does the sample I linked to in my original post passes an empty string as char_set parameter ? Is it incorrect?

I want to display several dynamic ASCII strings with this font. I assume I should pass in all the ASCII characters then?

Ryan · May 6, 2022, 10:25pm

When CreateTextRun should be used then?

It is actually tied to the Font, and whether the PDF Font is Simple (single byte encoding) or CID (multi-byte). You can call font.IsSimple at runtime to know which. CreateTextrun is for simple fonts, and CreateUnicodeTextRun is for CID fonts.

What happens if you run our ElementBuilder sample? You see the correct text? What happens if you edit the ElementBuilder sample to load your “Inter” font?

If you are still stuck could you provide your Font file, and also your generated PDF. If the files are confidential you can submit using the ticket form on pdftron.com

alanoe · May 13, 2022, 12:58pm

It is actually tied to the Font, and whether the PDF Font is Simple (single byte encoding) or CID (multi-byte). You can call font.IsSimple at runtime to know which. CreateTextrun is for simple fonts, and CreateUnicodeTextRun is for CID fonts.

So the choice between CreateTextRun() and CreateUnicodeTextRun() is based on the font, not on the text content. Also, Font.Create() may return either a simple or composite font. Thanks for clarifying it.

What happens if you run our ElementBuilder sample? You see the correct text? What happens if you edit the ElementBuilder sample to load your “Inter” font?

Yes, I see the correct text and I was able to load my “Inter” font.

I changed my code to use CreateUnicodeTextRun() instead of CreateTextRun() and it worked.

It seems that Font.Create(pdf_doc, font_name, charset) loads the entire font charset if an empty string is passed as charset. I tried it and I was able to print English text, numbers and also some non-ASCII characters,

Some pending questions:

My code is currently using Font.CreateCIDTrueTypeFont(). Could I use Font.CreateTrueTypeFont() instead to create a simple font if I only want to use ASCII characters, even though the font has more than that?
Can OpenType fonts be loaded or only TrueType ones?
Will Font.Create(pdf_doc, font_name, <charset>) look for all fonts in the OS default fonts directory with that full name? Is the loaded character set the only difference between Font.CreateCIDTrueTypeFont(pdf_doc, “/usr/share/fonts/truetype/inter/Inter-Regular.ttf”) and Font.Create(pdf_doc, “Inter”, <charset>) in a Linux OS? What happens if there is more than one font with the same full name installed on the OS? Is the first one found used?

Ryan · May 27, 2022, 7:20pm

Could I use Font.CreateTrueTypeFont() instead to create a simple font

Yes, see our ElementBuilder sample.

Can OpenType fonts be loaded or only TrueType ones?

Font.Create can open OpenType yes. Possibly CreateTrueType also works on OpenType yes.

Will Font.Create(pdf_doc, font_name, ) look for all fonts in the OS default fonts directory with that full name?

It will look for charset coverage, and then possibly at any other info that might be available/known, such as kerning/width etc. And name matching yes.

Perhaps it is best to focus on what you actually want to do.
What is your overall objective?
Why is using ElementBuilder important for you?
Why not convert from HTML, DOCX, or some other format, to PDF?