[thelist] PDF Summary Data...

Mike King mike.king at redroom.co.uk
Mon Feb 11 06:29:01 CST 2002


Has anyone ever had the need to strip out summary data from a PDF file?
Well, I do... and it seems to be a little more tricky than I first thought  :)

With PDFs of version 1.4 the summary info is stored in an XML structure
(RDF) within the file.
  <rdf:Description about=''
   <pdf:Producer>Acrobat Web Capture 5.0</pdf:Producer>
   <pdf:Title>This is a test title</pdf:Title>
   <pdf:Subject>My Subject</pdf:Subject>
   <pdf:Author>Mike King</pdf:Author>

Now, you can get at this information if you open the PDF in a text editor,
but when I try to ereg through it with PHP I can't get a match  :(
I've tried opening it as binary and text, still no difference. strstr can
find the start of the tags, but I want to ereg out all of the information!

I'm not worried about pre 1.4 version PDFs, 'cause hopefully all 1,035 will
be upgraded soon  :)

Anyone got any ideas?


More information about the thelist mailing list