Can I convert PDF to UTF-8 compliant xml files using PDFNet SDK?

Q: I'm looking to know if your PDF converter will take PDFs and
convert them to UTF-8 compliant xml files.
Specifically, I'm looking to create an xml source file (type xmlpipe2)
from PDF to be indexed by the Sphinx search engine.
A: You could use either PDFNet SDK ( or
PDF2Text ( for conversion of PDF to
UTF8 encoded text or XML files.

If you are looking for a more programmatic solution, you may want to
take a look at TextExtract sample project (
pdfnet/samplecode.html#TextExtract) in PDFNet. In this sample
pdftron.PDF.TextExtractor is used to extract Unicode strings which can
be encoded using any desired encoding (e.g. UTF8, UTF16BE/LE, MBC,

If you are looking for a command-line utility or SDK with a very
simple interface you may want to take a look at PDF2Text.