font and vector image issues for converting docx to xod

hammer_ren · September 19, 2017, 11:38pm

We converted docx file to xod using Convert.toXod() for WebViewer, and noticed several issues below:

The vector image have been mis-transformed.
The original font has not been recognised, and certain fallback font has been applied.
The list item icons (eg. bullet point) are missing.
Our application server is running on the AWS Linux EC2 instance. I found a post mention to setResourcesPath for referring to a template docx which embedded all fonts. Is there any alternative way?

The template docx seems solved the issue 3 as well, but does it mean we have to include all possible format in that file to get right output?

Any suggestion on the issue 1 about the vector image?

Ryan · September 21, 2017, 8:51pm

The best way to resolve this would be if you could send the DOCX file to support@pdftron.com and we can investigate further.

I found a post mention to setResourcesPath for referring to a template docx which embedded all fonts. Is there any alternative way?

From that post it suggests the following.

The file should have all the required fonts embedded within it (you can create a file like this by checking the “embed fonts in the file” option from the “Save” preferences within Word )

Would this solution not work for you? Why are you looking for an alternative?

The template docx seems solved the issue 3 as well, but does it mean we have to include all possible format in that file to get right output?

The idea is that this template contains the fonts that are needed for the target DOCX file. So if there are two fonts listed in the DOCX, that are on your Windows machine, but not on the AWS EC2 instance, then you would create the template docx file and put on the EC2 instance in the correct folder, as described in the post.

What kind of EC2 instance are you running? Windows or Linux?

Any suggestion on the issue 1 about the vector image?

We would need the source file, please post here or send to support.

hammer_ren · September 21, 2017, 11:17pm

Please see attached docx file.

The template docx could not solve the problem for bold & italic font, which means even I embedded the bold & italic font in the template all content were converted to normal. How can I solve it?

I just can not believe that the converter could not even recognised very comman Arial font and could not fall back to similar font without providing extra template. Can I have a list of supported common font or possible similar fall back font? I’ve tried to find document about the fonts dictionary used by PDFNet but no luck.

OmTrak - security and infrastructure.docx (434 KB)

Ryan · September 27, 2017, 7:41pm

Thank for the extra information. While I gather this info for you, would it be possible to get a copy of the XOD (and/or PDF) you generated?
Also a screenshot of what you see would help a lot, clearly showing the browser/application you are using to view the document.

Please also confirm that you are doing the conversions on Linux OS or not.

hammer_ren · October 5, 2017, 10:20pm

Yes, our testing server is AWS Linux. Please see attached screenshot and the XOD file (generated by Convert.toXOD()).

OmTrak - security and infrastructure.xod (247 KB)

Ryan · October 17, 2017, 12:50am

Thank you for your patience while we improve the conversion of this file.

While the fix is not yet available in a production build, you can preview our developer nightly.
Developer channel: http://www.pdftron.com/nightly/?p=experimental/
Latest official builds, and Release channel, are ready for production usage, however the developer channel builds do not get the same amount of testing and can be in a state of change.

Attached is the latest output. I will let you know when these fixes are available in production.

OmTrak - security and infrastructure.pdf (228 KB)

hammer_ren · October 17, 2017, 1:46am

The sample output showing a great improvement in the conversion of a word doc we supplied. However we note, it still did not convert the background spiders web properly (just shows a solid light grey background), also a link on page 3 (http://en.wikipedia.org/wiki/Netflix) has been incorrectly split across 2 lines.

hammer_ren · October 17, 2017, 3:00am

Here is another docx totally screwed after converted to pdf. The table layout is slightly of in page 4 and from that on all the images are missing.

I’ve attached the original docx and converted pdf. Please advice.

3.docx (4.88 MB)

hammer_ren · October 17, 2017, 3:03am

Please see attached converted pdf from previous post due to the limit of attachment size.

3.pdf (2.39 MB)

Ryan · October 17, 2017, 4:32pm

Thank you for the report on the first file, and the follow up document. I have assigned these to the Office conversion team.

Going forward it would be best to report any new issues using our report a problem form. That will be better for tracking.
https://www.pdftron.com/support/reportproblem.html

Ryan · November 17, 2017, 10:09pm

Thank you again for your patience and assistance with improving our Office Converter.

Attached is an updated file showing progress so far.

I will provide additional information via direct email to your ticket 59f6a68713dec.

3_progress.pdf (4.77 MB)

hammer_ren · November 19, 2017, 10:15pm

Thanks Ryan, looking forward to the final fix and release.

hammer_ren · January 17, 2018, 5:38am

Hi Ryan,

We’ve tested the latest build you sent via email. It had some gradually improvement for the sample document we provided above but made worse for others.

Firstly, the 3.pdf is still not converted close enough to the original docx and not even close to the progress outcome you provided previously. Currently we ran PDNet.getVersion() in our server and got output as 6.716303. Please see attached converted pdf.

I will attached couple other word documents which failed our test in the following post because of the size limitation.

3.pdf (4.71 MB)

hammer_ren · January 17, 2018, 5:42am

Please see attached 2 word documents which failed our test under PDFNet version 6.716303.

1.docx (210 KB)

2.docx (1.92 MB)

Ryan · January 17, 2018, 9:27pm

Thank you for the updates, I have assigned these to the team, and I will keep you posted on progress.

Ryan · January 17, 2018, 9:41pm

This file, 3.docx, is particularly problematic, as even MS Word is having difficulties with this file, and we see different results in different versions.

What are you using as a reference? MS Word 2016?

Do you have other files like this, that appear differently in different versions of MS Word? Or is the only one?

hammer_ren · January 23, 2018, 1:11am

Hi Ryan,

That’s the only one file we noticed it appear differently in different versions of MS Word. But the rest of samples still have all sort of problem after conversion.

Ryan · January 23, 2018, 5:52pm

That’s the only one file we noticed it appear differently in different versions of MS Word.

Even though it is just the one file, it would help us a lot with this particular file if we knew what version of MS Word you are using to evaluate.

hammer_ren · January 23, 2018, 11:11pm

We’re using MS Office 2016 to evaluate.