create blog

go home go home
  1. about
  2. code
  3. wiki
  4. blog

Archive for May, 2009

My Coding Style

Tuesday, May 19th, 2009

My coding style (for C++, at least) is probably slightly controversial. I use tabs, not spaces. Every brace, almost without exception, is on a new line (the exception is when the entire function definition, including braces, can be on one line).

Now one that is a bit weird, but I started doing for my own sanity awhile back: before every few lines, if not every line, there is a comment. After the line or set of lines, there is a blank line. It doesn’t matter if the comment says something extremely meaningful (though I prefer it to be meaningful), but it matters that it is there. The syntax coloring of the comment, combined with the line break, somehow makes the code much simpler for me to read.

It is preferable that the content be in first-person, but in plural form; that is, it is preferable that it start with “we,” as in “I and the program,” or “whoever you are reading this, and I the original programmer.” I don’t know why, but that is my preference, and, being the one who works most on the code, what I say goes.

Here is an example from a still very much work-in-progress function (so don’t make too much fun!) involved in the parsing of PDF:

PDFObject PDFFile::parse()
{
	//there are a few main things we could see. We could see an object, or a
	//delimiter.
 
	//what we do is simple. Skip whitespace (we have a function for that), read
	//until either delimiter or whitespace.
 
	//we will either have a simple object (number, boolean, integer, null)
	//or a delimiter. The delimiter tells us what we may want to do next, for
	//instance, use parseString, parseName, etc.
 
	//NOTE:
	//peek is our friend. Since we read one character at a time, it is much more
	//practical than creating our own in-memory buffer to remember, for instance,
	//the last delimiter we saw.
 
	//consume whitespace
	this->consumeWhitespace();
 
	//token buffer. This holds the token if we can process it whole.
	std::string buffer;
 
	//loop until we see either whitespace or a delimiter 
	//(peeking the whole time)
	while (true)
	{
		char c = this->input->peek();
 
		//if it is a delimiter or is whitespace, we are done reading.
		if (isWhitespace(c) || isDelimiter(c))
			break;
 
		this->input->get(c);
		buffer += c;
	}
 
	//now parse our buffer. If it is empty, it must be a boolean, string, etc.
	if (buffer.length() == 0)
	{
		//we must have a delimiter.
		//see what it means.
		char delimiter;
		this->input->get(delimiter);
 
		//see what it is...
		if (delimiter == '(')
		{
			return processString(); //will consume up to the ending )	
		}
		else if (delimiter == '<')
		{
			//peek the next character. if it is also a <, then this is a dict.
			if (this->input->get( //note: this is why I said work-in-progress.
			// it isn't finished.
		}
	}
 
 
}

Is the code a lot longer than it could be due to comments? Yes. But for one reason or another, it makes it much easier for me to understand and debug later.

Update: Something that, unfortunately, I am not consistent enough about is the use of the this-> prefix for variable and function names. I prefer to use it, but sometimes I don’t, as seen in “processString” above.

PDF Parsing

Tuesday, May 19th, 2009

Just a little status update: yesterday, I finally managed to make the new PDF engine pass all PDF-related tests in the Create Framework. It is revision 100 in the pdf-parsing branch on Launchpad.

Yes, all of the tests have been updated, but they were hand-checked first. As the new engine handles the timing of the writing of objects differently, the tests will work differently. They will change further, in fact, because I just realized that the maximum size for a page tree node is still set to 3 (for debugging purposes) — which is not at all what the final size should be — so the tests that have more than three pages are, unfortunately, wrong. Further, it is possible that some areas of the Create Framework may even have some leaks regarding PDF, so fixes to those could cause problems. This is not a stable branch, is not meant to be a stable branch, and is not guaranteed to even be in a compilable state. I’ve disclaimed, so if your computer blows up, it is not my fault.

Now, I’ve finally started PDF parsing. Currently, I’ve got the first XRef table being read in. I’m excited. Hopefully, by the end of the week, I’ll have some form of PDF-in-PDF embedding working. Currently, I’m aiming to support only PDF 1.4 — before all of that cross-reference stream and object stream business came about that would require the ability to handle compression (which, while on the eventual to-do list, is not currently a priority).

DOM-Dependence in JavaScript Libraries

Wednesday, May 6th, 2009

The Create Framework supports JavaScript to a growing extent. When using JavaScript in the Create framework, I’ve often wanted functions that I’ve used or seen in libraries like Prototype. It is often easy enough to copy-and-paste the functions I want from the library into my own code. However, I cannot simply import the entire library, or include it in the Framework, because there are several functions which have direct dependence on the DOM (the HTML Document Object Model which describes web pages). This is a bit annoying, especially as JavaScript is becoming useful outside of the web. Obviously, these libraries were meant for the web, and they fulfill their purpose wonderfully. It would be nice, however, to have libraries for non-web scenarios (perhaps some already exist, and I just haven’t stumbled across them). The Create Framework will, undoubtedly, eventually get such a library of its own.

At some point, I hope to port Protovis to the framework. It seems to be made in such a way that it should be relatively easy to port: almost all of the work is done in Canvas, so if I just implement a canvas act-alike in the Create Framework, everything should work fine.