How can I get letter-spacing word-spacing using TextExtractor.

Hey all,

Working with what I learned from working the the TextExtractorTest program. I am working on a tech demo for my boss. I have figured out how to get almost all the information I need except when I try to
get the letter-spacing and word-spacing, they are not available to the TextExtractor class. If i can get the font, size, weight, serif, why can i not also assess the line and word spacing?

The ruby code

This all works great.
def PrintStyle (style)
        puts " style=\"font-family:" + style.GetFontName + "; font-size:" +
                  style.GetFontSize.to_s + "; sans-serif: " + style.IsSerif.to_s +
                  "; color:" + style.GetColor.to_s + "\""
But this fails

puts style.GetCharSpacing.to_s

`PrintStyle': undefined method `GetCharSpacing' for #<PDFNetRuby::Style:0x007ff5f1863690> (NoMethodError)

Am I gonna have to abandon TextExtractor and roll out a version using an Element Reader or am I missing something.

Any help would be appreciated.

zonker

I have the same problem! Did you find an alternative?

Hello,

This information is only accessible through ElementReader interface.

I guess you could use the info provided by TextExtractor ( namely glyph and word bounding boxes) to find the info you are looking for.

So,

… compute average char spacing by finding distances between x1…x2 on consecutive Glyph bboxes on the same line

… compute average word spacing by finding distances between x1…x2 on consecutive Word bboxes on the same line

for words:

pdftron::PDF::TextExtractor::Word::GetBBox() or GetQuad()
https://www.pdftron.com/pdfnet/docs/PDFNetC/de/d3a/classpdftron_1_1_p_d_f_1_1_text_extractor_1_1_word.html#a76e7f7a086e3c44d8a261bd77995f9ad

for glyphs

pdftron::PDF::TextExtractor::Word::GetGlyphQuad
https://www.pdftron.com/pdfnet/docs/PDFNetC/de/d3a/classpdftron_1_1_p_d_f_1_1_text_extractor_1_1_word.html#a8610b0e1e7c1949c336c637d443ebbf0