How do I find out if a Stream Obj is a Font, Image, Form ...?

Q:

I’m making a ‘generic’ low-level object viewer for pdf’s, to easier find out what is wrong with any somewhat corrupted file (and at the same time learn about more details on PDFTron).

In that context I’m missing a way to find out what the TYPE is of a the content in an object with IsStream=true ?

Is the only way to make a list of of one or more key/value pairs needed for each type?

As an example I know that if i find the pairs

Type=“XObject” and SubType=“Image”

or

Type=“Page” and SubType=“Thumb”

it is an image, but I would much prefer if I had something like a .GetStreamDataType method

If i do something like the above

Dim image As New pdftron.PDF.Image(CurObj)

or

Dim elm_reader As New ElementReader

elm_reader.Begin(CurObj)

it works on any stream-data

Currently I simply try to use one and see if the use of the result gives an exception - and i certainly do not like that approach.

Especially if it should also handle Metadata-xml, Fontfiles et cetera.

Is there a ‘clean’ way to do this??

Is there a finite list of possible types of stream-content in a pdf, and well known ways to retrieve them in PDFtron like the two just mentioned? I have tried to find it by google to no avail. There are not much on LowLevel objects with IsStream=true out there.

Another related question:

An image retrieved from an object with IsStream=true seems to ALWAYS have IsValid=false

Dim image As New pdftron.PDF.Image(CurObj)

no matter if it actually IS a valid image or not.

A:

In general it is not possible to obtain info about the type of a stream. You could use “Subtype”, “Type”, and other key value pairs as hints regarding what the stream represents, however you will encounter many streams that only make sense in the context of other SDF/Cos relationships (i.e. that objects are pointing to the stream etc.).

In case you would like to implement Obj.GetStreamDataType () method it should not be difficult using existing API. For example:

enum HighLevelType {

e_Unknow,

e_Image,

e_Form,

e_Font,

}

HighLevelType GetStreamDataType (Obj o) {

If (o.IsStream()) {

Obj t = o.FindObj (“Type”);

If (t!=null && t.GetName()==“XObject”) {

Obj s = o.FindObj (“Subtype”);

If (s!=null && s.GetName()==“Image”) {

return e_Image;

}

else {

return e_Form;

}

}

}

else If (o.IsDict()) {

Obj t = o.FindObj (“Type”);

If (t!=null && t.GetName()==“Font”) {

return e_Font;

}

}

return e_Unknow;

}

Keep in mind that you may encounter files where PDF generators do not respect the spec, so objects may be missing Type/Subtype entries. In this case you could use other required entries (e.g. Width/Height) to infer the type.