create blog

go home go home
  1. about
  2. code
  3. wiki
  4. blog

PDF Parsing

Just a little status update: yesterday, I finally managed to make the new PDF engine pass all PDF-related tests in the Create Framework. It is revision 100 in the pdf-parsing branch on Launchpad.

Yes, all of the tests have been updated, but they were hand-checked first. As the new engine handles the timing of the writing of objects differently, the tests will work differently. They will change further, in fact, because I just realized that the maximum size for a page tree node is still set to 3 (for debugging purposes) — which is not at all what the final size should be — so the tests that have more than three pages are, unfortunately, wrong. Further, it is possible that some areas of the Create Framework may even have some leaks regarding PDF, so fixes to those could cause problems. This is not a stable branch, is not meant to be a stable branch, and is not guaranteed to even be in a compilable state. I’ve disclaimed, so if your computer blows up, it is not my fault.

Now, I’ve finally started PDF parsing. Currently, I’ve got the first XRef table being read in. I’m excited. Hopefully, by the end of the week, I’ll have some form of PDF-in-PDF embedding working. Currently, I’m aiming to support only PDF 1.4 — before all of that cross-reference stream and object stream business came about that would require the ability to handle compression (which, while on the eventual to-do list, is not currently a priority).

Leave a Reply