Given use of TextExtractor is there any way to determine BASELINE of TextExtractor.Word?

PDF spec states quadrilaterals map the glyph, including ascenders and descenders. I realize quads represent:

I take it TextExtractor.Word.BBox is the most minimal level square including all 4 points above.

I am looking for the baseline. My text is typically not slanted as shown above.

The only thing I can see to do is to open an output page, render a full character set, Subtract Ascent from the top of the BBox of the sample, or add Descent (this is negative) to the bottom of the BBox of the sample. Perhaps there is an easier way in the context of line and word extraction using TextExtractor?

Unfortunately baseline cannot be accessed through the TextExtractor interface.

However you can do it through ElementReader interface. You might want to take a look at the DumpText function in the following example:

For every e_text element this is how you access the baseline:


CharIterator itr = element.GetCharIterator();
double baseline = itr.Current().y;


Let me know if that helps.