The framework’s PDF engine has largely been rewritten. This adds a lot of new abilities which are quite exciting (and, in the meantime, we even changed a couple of unrelated things).
Here’s a list of changes:
- Page Trees. We used to write one PDF array of pages. Now, we write trees of pages, allowing us to keep only a small number of page nodes in memory at a time.
- Compression. Content streams in PDF files are now compressed. Why did we implement this now? Well, that’s a funny story…
- Decompression. It turns out that, in our effort to embed one PDF inside of another, we may have to deal with multiple content streams. It further turns out that they still represent one long stream (as opposed to several individual segments) and as such cannot be used separately from each other. So, to put them in one object in the PDF file that may be reused you have to decompress and concatenate them. And, since we’re decompressing, why not compress as well?
- PDF Parsing. After the near-rewrite of the PDF component, it was actually quite easy to allow PDF parsing. It only took us around a week or so. Now, we are able to open a PDF file and rip contents out of it.
- PDF Embedding. It has, for awhile, been possible to embed PDFs inside PPML. However, to embed PDFs inside other PDF files, you need to actually deconstruct the PDF to be embedded. That’s why we implemented parsing. Amazingly, it all appears to work — although, unfortunately, it does not (currently) work for PDFs with compressed cross-reference streams or compressed object streams.
- Better PDF Embedding. Not only does PDF-in-PDF embedding work, but PDF-in-PPML embedding has been improved nicely. The width and height of any embedded PDF are read in, so it is now possible to do lots of fancy stuff. Now, PDFs are treated almost exactly like images. You can make them fill a frame, you can make them fit in a frame, you can make them centered in a frame… basically, you can do anything you can do with an image with a PDF — in both PPML and PDF outputs.
- Text Processing Bug Fixed. Imagine the words “Hello World.” What if both cannot fit on the same line? They should, then, naturally be put on separate lines. But what if it was just the space between the words making the difference? Still, they need to be on separate lines. Unfortunately, the framework had a bug here — it was determining that it needed to break, but although it split at the right point, it did not actually go to the new line. This has been fixed!
- Line Height = 1.2. The framework now supports line heights for runs of text. This is, basically, the amount of spacing there should be between lines. Because we use InDesign quite often here, we made the framework default to line spacing of 1.2 — like InDesign’s — which changes how text flows in any application that has text boxes with more than one line of text.
Where Are We Going
So what’s left? Well, there could be some memory usage issues; the engine was using a suspiciously large amount of memory the other day when processing a set of a few thousand records. It may or may not have been a small (or large) leak. We will get around to fixing this problem (if it exists) when it starts to bother us.
We want to implement scripting support for PDF, and by that, I mean that we want to be able to open a PDF using Script, determine things such the size of a page, and so on. It does not necessarily need to talk directly to PDF — it could talk to PDFLink — but we want a tool we can use to automate things such as imposition.
Of course, scripting of views could allow this, and this is an eventual goal. In the meantime, however, we may see some form of compiled dC, where dC is generated (compiled) live through some script, allowing forms of meta-programming.
A big part of all of this will be the conversion from the Engine to the Shell, which is a JavaScript based platform. The Shell will be able to simply run scripts, or do Create Framework related things. This will make it much easier to maintain the current Shell, which is used for running some current scripts here at TPSi.
Finally, dC namespaces need a rewrite, and that will happen eventually.
We have another project we will be working on for a little while. It is related to the Create Framework, but a bit too experimental to announce quite yet.