How to insert non-ASCII text with OpenSans font (PDFTron.Net.x64)

amenzies · August 31, 2022, 6:43pm

Product: PDFTron.NETCore.Windows.x64 (NuGet package for C#)

Product Version: 9.3.0

Please give a brief summary of your issue: How do I programmatically insert text with non-ASCII characters (like the Japanese character あ) and have it show up properly in the rendered PDF, using the OpenSans font if at all possible?

Please describe your issue and provide steps to reproduce it:
I am building a PDF consisting of 2 components. First, the majority of the PDF consists of a webpage that I am converting from HTML to PDF using the “HTML2PDF” module. The webpage includes text in Open Sans (the font is set in CSS and is obtained from a CDN: https://fonts.googleapis.com/css?family=Open+Sans:300,400,600,700), and that text may include international, non-ASCII characters. This part works fine; I have tried it when the webpage contains a あ character in Open Sans font, and it renders correctly in the output PDF.

The second component consists of adding headers and footers to each page. Once again this can contain non-ASCII characters. This is where I’m running into problems: I’m trying to use the text-insertion methods in the PDFTron library to add a footer, but no matter what I try, I can’t get the あ character to show up.

Here’s the relevant snippet from my approach. Note that the “html”, “converter”, and “settings” variables are set by prior code which I am omitting for brevity because the HTML2PDF part works correctly. The part within the using statements for ElementBuilder and ElementWriter is where the problem is occurring.

string htmlString = html.ToString();

converter.InsertFromHtmlString(htmlString, settings);
using (var doc = new PDFDoc())
	{
	converter.Convert(doc);
	using (ElementBuilder eb = new ElementBuilder())
	using (var writer = new ElementWriter())
		{
		for (int i = 1; i <= doc.GetPageCount(); i++) // doc.GetPage uses 1-based instead of 0-based indexing.
			{
			var page = doc.GetPage(i);
			writer.Begin(page);
			/* 1A */ var font = pdftron.PDF.Font.CreateTrueTypeFont(doc, "Resources/OpenSans-Regular.ttf");
			/* 1B */ //var font = pdftron.PDF.Font.CreateCIDTrueTypeFont(doc, "Resources/OpenSans-Regular.ttf");
			/* 1C */ //var font = pdftron.PDF.Font.CreateType1Font(doc, "Resources/OpenSans-Regular.ttf");
			writer.WriteElement(eb.CreateTextBegin());
			/* 2A */ var textRun = eb.CreateTextRun($"test page あ {i}", font, 14);
			/* 2B */ //var textRun = eb.CreateUnicodeTextRun($"test page あ {i}", font, 14);
			textRun.SetTextMatrix(1, 0, 0, 1, 5, 10);
			writer.WriteElement(textRun);
			writer.WriteElement(eb.CreateTextEnd());
			writer.End();
			}
		}
	documents.Add(doc.Save(SDFDoc.SaveOptions.e_compatibility));
	}

As you can see from the 1A, 1B, 1C, 2A, and 2B comments, I have tried numerous combinations of ways to load the Open Sans font and insert the needed text, but none of them work.

1A + 2A: Produces “test page ? 1” instead of “test page あ 1”
1B + 2A: Produces a bunch of characters that appear invalid (squares) when viewing the PDF but change to “瑥獴⁰慧攠㼠” when copied and pasted here.
1A + 2B, 1B + 2B: Produces no visible text.
1C: This line throws an exception: “Failed to select charmap.”

Note that the font file “OpenSans-Regular.ttf” was downloaded from https://www.opensans.com/ and, when installed in Windows and used in other applications like Word, displays the “あ” character correctly.

Since the HTML2PDF module is able to use this font to display international characters, it should be possible to get the main part of the PDFTron library to work with it…do you have any recommendations on how?

system · August 31, 2022, 6:43pm

Hello, I’m Ron, an automated tech support bot

While you wait for one of our customer support representatives to get back to you, please check out some of these documentation pages:

Guides:

Forums:

kmirsalehi · September 1, 2022, 11:01pm

Hi Andrew,

Please take a look at the following forum posts which should answer your questions:

Please let me know how this works for you, and if you have any further questions.

amenzies · September 30, 2022, 3:24pm

Hi, Kmashei,

I tried using Font.Create and passing in the string I was about to render as the 3rd parameter, as suggested in the 2nd one of those threads you linked. Unfortunately this didn’t solve the issue. Calling CreateTextRun with the font still produces a ? instead of the Japanese character, and calling CreateUnicodeTextRun produces the string “瑥獴⁰慧攠㼠” instead of the expected “test page あ 1”.

For now I’ve decided to only use PDFTron for HTML-to-PDF conversion and stick with the PDFSharp library for programmatically inserting text since PDFSharp lets me mix ASCII and non-ASCII characters in a single string without any problems or complicated font-setting.