f_f
1
hello,
i'm parsing pdf magazines and need to separate text from images.
all ok for now but i have a problem.
i want to save the embedded that comes withing the pdf to be separated file
with not much documentation about this i managed to do something but it doesn't quite work:
if (font.IsEmbedded()&&font.GetEmbeddedFontBufSize()>0)
{
Obj fObj = ereader.GetFont(font.GetName());
if (fObj!=NULL)
{
Filter filter = fObj.GetDecodedStream();
FilterReader reader(filter);
unsigned char fontStream[1024];
int counter=0;
FILE*file=fopen("fontFile.ttf","wb");
while (reader.Read(fontStream,1024)>=1024)
{
counter++;
fwrite(fontStream,1,sizeof(fontStream),file);
}
fclose(file);
}
}
i could use some help with this.
thanks.
You can extract the embedded font (is any) using the code along the
following lines:
// C# pseudocode (C++ & JAVA is essentailly the same, apart from minor
syntax differences).
pdftron.PDF.Font font = // ... e.g. new pdftron.PDF.Font(fObj);
Obj font_stm = font.GetEmbeddedFont();
if (font_stm != null) {
Filter flt = font_stm.GetDecodedStream();
FilterReader reader = new FilterReader(flt);
StdFile out_file = new StdFile("out.dat",
StdFile.OpenMode.e_write_mode);
FilterWriter writer = new FilterWriter(out_file);
writer.WriteFilter(reader);
writer.Flush();
out_file.Close();
}
fraga
3
What extension shall i use? ttf? how can i know?
Hello Fraga,
If the font is TrueType, you should be able to use the TTF file extension.
There are third-party tools which may help you identify information about the font after you extract it:
http://unix.stackexchange.com/questions/26053/is-there-a-unix-command-line-tool-that-can-analyze-font-files