Some Chinese characters are getting lost when converting HTML2PDF


When I convert the attached HTML file to PDF via HTML2PDF, some of characters are missing from the resulting PDF file. Below is the Java code I am using. We have PDFTron running on a Linux server. If I run my server on Windows, the characters display correctly. I tried embedding a Unicode font using Font.create as shown below, but it did not make a difference. Any suggestions? Thanks.

File localHtmlFile = File.createTempFile(“dwnldCvrt” + UUID.randomUUID(), “.html”, new File(path));

IOUtils.copy(in, new FileOutputStream(localHtmlFile));

doc = new PDFDoc();

Font.create(doc.getSDFDoc(), “Arial Unicode”, “UTF-8”);

HTML2PDF.convert(doc, localHtmlFile.getPath());

The HTML2PDF module doesn’t use the same font loading logic as PDFNet. The HTML2PDF module though does search the OS for fonts, so on a linux system you would typically put the font in the /usr/share/fonts folder, or one of its sub-folders (TTF, truetype, etc).