A Few More Thoughts on Literate Programming

A while back, I became intrigued by Donald Knuth’s idea of Literate Programming, and decided to give it a shot. That first attempt was basically just me writing down what I knew as quickly as I learned it, and trying to pass it off as a knowledgeable tutorial. More recently, I tried a second project, a web-app that solves Wordle, and thought I’d write it in the Literate style as well.

The first time around, I learned the mechanics. The second time, I was able to learn one or two things about the coding itself.

(For those who don’t remember, in literate programming, you write code intertwined with prose that explains the code, and a post-processor turns the result into a pretty document for humans to read, and ugly code for computers to process.

1) The thing I liked the most, the part where literate programming really shines, is having the code be grouped not by function or by class, but by topic. I could introduce a <div class="message-box"></div> in the main HTML file, and in the next paragraph introduce the CSS that styles it, and the JavaScript code that manipulates it.

2) In the same vein, several times I rearranged the source to make the explanations flow better, not discuss variables or functions until I had explained why they’re there and what they do, without it altering the underlying HTML or JavaScript source. In fact, this led to a stylistic quandary:

3) I defined a few customization variables. You know, the kind that normally go at the top for easy customization:

var MIN_FOO = 30;
var MAX_FOO = 1500;
var LOG_FILE = "/var/log/mylogfile.log";

Of course, the natural tendency was to put them next to the code that they affect, somewhere in the middle of the source file. Should I have put them at the top of my source instead?

4) Even smaller: how do you pass command-line option definitions to getopt()? If you have options -a, -b, and -c, each will normally be defined in its own section. So in principle, the literate thing to do would be to write

getopt("{{option-a}}{{option-b}}{{option-c}}");

and have a section that defines option-a as “a“. As you can see, though, defining single-letter strings isn’t terribly readable, and literate programming is all about readability.

5) Speaking of readability, one thing that can come in really handy is the ability to generate a pretty document for human consumption. Knuth’s original tools generated TeX, of course, and it doesn’t get prettier than that.

I used org-mode, which accepts TeX style math notation, but also allows you to embed images and graphviz graphs. In my case, I needed to calculate the entropy of a variable, so being able to use proper equations, with nicely-formatted sigmas and italicized variables, was very nice. I’ve worked in the past on a number of projects where it would have been useful to embed a diagram with circles and arrows, rather than using words or ASCII art.

6) I was surprised to find that I had practically no comments in the base code (in the JavaScript, HTML, and CSS that were generated from my org-mode source file). I normally comment a lot. It’s not that I was less verbose. In fact, I was more verbose than usual. It’s just that I was putting all of the explanations about what I was trying to do, and why things were the way they are, in the human-docs part of the source, not the parts destined for computer consumption. Which, I guess, was the point.

7) Related to this, I think I had fewer bugs than I would normally have gotten in a project of this size. I don’t know why, but I suspect that it was due to some combination of thinking “out loud” (or at least in prose) before pounding out a chunk of code, and of having related bits of code next to each other, and not scattered across multiple files.

8) I don’t know whether I could tackle a large project in this way. You might say, “Why not? Donald Knuth wrote both TeX and Metafont as literate code, and even published the source in two fat books!” Well, yeah, but he’s Donald Knuth. Also, he was writing before IDEs, or even color-coded code, were available.

I found org-mode to be the most comfortable tool for me to use for this project. But of course that effectively prevents people who don’t use Emacs (even though they obviously should) from contributing.

One drawback of org-mode as a literate programming development environment is that you’re pretty much limited to one source file, which obviously doesn’t scale. There are other tools out there, like noweb, but I found those harder to set up, or they forced me to use (La)TeX when I didn’t want to, or the like.

9) One serious drawback of org-mode is that it makes it nearly impossible to add cross-reference links. If you have a section like

function myFunc() {
var thing;
{{calculate thing}}
return thing;
}

it would be very useful to have {{calculate thing}} be a link that you can click on to go to the definition of that chunk. But this is much harder to do in org-mode than it should be. So is labeling chunks, so that people can chase cross-references even without convenient links. It has a lot of work to be done in that regard.

WFHing with Emacs: Work Mode and Command-Line Options

Like the rest of the world, I’m working from home these days. One of the changes I’ve made has been to set up Emacs to work from home.

I use Emacs extensively both at home and at work. So far, my method for keeping personal stuff and work stuff separate has been to, well, keep separate copies of ~/.emacs.d/ on work and home machines. But now that my home machine is my work machine, I figured I’d combine their configs.

To do this, I just added a -work command-line option, so that emacs -work runs in work mode. The command-switch-alist variable is useful here: it allows you to define a command-line option, and a function to call when it is encountered:

(defun my-work-setup (arg)
   ;; Do setup for work-mode
  )
(add-to-list 'command-switch-alist
  '("work" . my-work-setup))

Of course, I’ve never liked defining functions to be called only once. That’s what lambda expressions are for:

(add-to-list 'command-switch-alist
  '("work" .
    (lambda (arg)
      ;; Do setup for work-mode
      (setq my-mode 'work)
      )))

One thing to bear in mind about command-switch-alist is that it gets called as soon as the command-line option is seen. So let’s say you have a -work argument and a -logging option. And the -logging-related code needs to know whether work mode is turned on. That means you would always have to remember to put the -work option before the -logging option, which isn’t very elegant.

A better approach is to use the command-switch-alist entries to just record that a certain option has been set. The sample code above simply sets my-mode to 'work when the -work option is set. Then do the real startup stuff after the command line has been parsed and you know all of the options that have been passed in.

Unsurprisingly, Emacs has a place to put that: emacs-startup-hook:

(defvar my-mode 'home
  "Current mode. Either home or work.")
(add-to-list 'command-switch-alist
  '("work" . (lambda (arg)
                (setq my-mode 'work))))

(defvar logging-p nil
  "True if and only if we want extra logging.")
(add-to-list 'command-switch-alist
  '("logging" . (lambda (arg)
                  (setq logging-p t))))

(add-hook 'emacs-startup-hook
  (lambda nil
    (if (eq my-mode 'work)
      (message "Work mode is turned on."))
    (if logging-p
      (message "Extra logging is turned on."))
    (if (and (eq my-mode 'work)
             logging-p)
      (message "Work mode and logging are both turned on."))))

Check the *Messages* buffer to see the output.