Literate Lists

I’ve written before about literate programming, and how one of its most attractive features is that you can write code with the primary goal of conveying information to a person, and only secondarily of telling a computer what to do. So there’s a bit in my .bashrc that adds directories to $PATH that isn’t as reader-friendly as I’d like:

for dir in \
    /usr/sbin \
    /opt/sbin \
    /usr/local/sbin \
    /some/very/specific/directory \
    ; do
    PATH="$dir:$PATH"
done

I’d like to be add a comment to each directory entry, explaining why I want it in $PATH, but sh syntax won’t let me: there’s just no way to interleave strings and comments this way. So far, I’ve documented these directories in a comment above the for loop, but that’s not exactly what I’d like to do. In fact, I’d like to do something like:

$PATH components

  • /usr/sbin
  • /usr/local/bin
for dir in \
    {{path-components}} \
    ; do
    PATH="$dir:$PATH"
done

Or even:

$PATH components

DirectoryComments
/usr/sbinsbin directories contain sysadminny stuff, and should go before bin directories.
/usr/local/binLocally-installed utilities take precedence over vendor-installed ones.
for dir in \
    {{path-components}} \
    ; do
    PATH="$dir:$PATH"
done

Spoiler alert: both are possible with org-mode.

Lists

The key is to use Library of Babel code blocks: these allow you to execute org-mode code blocks and use the results elsewhere. Let’s start by writing the code that we want to be able to write:

#+name: path-list
- /usr/bin
- /opt/bin
- /usr/local/bin
- /sbin
- /opt/sbin
- /usr/local/sbin

#+begin_src bash :noweb no-export :tangle list.sh
  for l in \
      <<org-list-to-sh(l=path-list)>> \
      ; do
      PATH="$l:$PATH"
  done
#+end_src

Note the :noweb argument to the bash code block, and the <<org-list-to-sh()>> call in noweb brackets. This is a function we need to write. It’ll (somehow) take an org list as input and convert it into a string that can be inserted in this fragment of bash code.

This function is a Babel code block that we will evaluate, and which will return a string. We can write it in any supported language we like, such as R or Python, but for the sake of simplicity and portability, let’s stick with Emacs lisp.

Next, we’ll want a test rig to actually write the org-list-to-sh function. Let’s start with:

#+name: org-list-to-sh
#+begin_src emacs-lisp :var l='nil
  l
#+end_src

#+name: test-list
- First
- Second
- Third

#+CALL: org-list-to-sh(l=test-list) :results value raw

The begin_src block at the top defines our function. For now, it simply takes one parameter, l, which defaults to nil, and returns l. Then there’s a list, to provide test data, and finally a #+CALL: line, which contains a call to org-list-to-sh and some header arguments, which we’ll get to in a moment.

If you press C-c C-c on the #+CALL line, Emacs will evaluate the call and write the result to a #+RESULTS block underneath. Go ahead and experiment with the Lisp code and any parameters you might be curious about.

The possible values for the :results header are listed under “Results of Evaluation” in the Org-Mode manual. There are a lot of them, but the one we care the most about is value: we’re going to execute code and take its return value, not its printed output. But this is the default, so it can be omitted.

If you tangle this file with C-c C-v C-t, you’ll see the following in list.sh:

for l in \
    ((/usr/bin) (/opt/bin) (/usr/local/bin) (/sbin) (/opt/sbin) (/usr/local/sbin)) \
    ; do
    PATH="$l:$PATH"
done

    It looks as though our org-mode list got turned into a Lisp list. As it turns out, yes, but not really. Let’s change the source of the org-list-to-sh() function to illustrate what’s going on:

    #+name: org-list-to-sh
    #+begin_src emacs-lisp :var l='nil :results raw
      (format "aaa %s zzz" l)
    #+end_src

    Now, when we tangle list.sh, it contains

        aaa ((/usr/bin) (/opt/bin) (/usr/local/bin) (/sbin) (/opt/sbin) (/usr/local/sbin)) zzz \

    So the return value from org-list-to-sh was turned into a string, and that string was inserted into the tangled file. This is because we chose :results raw in the definition of org-list-to-sh. If you play around with other values, you’ll see why they don’t work: vector wraps the result in extraneous parentheses, scalar adds extraneous quotation marks, and so on.

    Really, what we want is a plain string, generated from Lisp code and inserted in our sh code as-is. So we’ll need to change the org-list-to-sh code to return a string, and use :results raw to insert that string unchanged in the tangled file.

    We saw above that org-list-to-sh sees its parameter as a list of lists of strings, so let’s concatenate those strings, with space between them:

    #+name: org-list-to-sh
    #+begin_src emacs-lisp :var l='nil :results raw
      (mapconcat 'identity
    	     (mapcar
    	      (lambda (elt)
    		(car elt)
    		)
    	      l)
    	     " ")
    #+end_src

    This yields, in list.sh:

    for l in \
        /usr/bin /opt/bin /usr/local/bin /sbin /opt/sbin /usr/local/sbin \
        ; do
        PATH="$l:$PATH"
    done

    which looks pretty nice. It would be nice to break that list of strings across multiple lines, and also quote them (in case there are directories with spaces in them), but I’ll leave that as an exercise for the reader.

    Tables

    That takes care of converting an org-mode list to a sh string. But earlier I said it would be even better to define the $PATH components in an org-mode table, with directories in the first column and comments in the second. This is easy, with what we’ve already done with strings. Let’s add a test table to our org-mode code, and some code to just return its input:

    #+name: echo-input
    #+begin_src emacs-lisp :var l='nil :results raw
      l
    #+end_src
    
    #+name: test-table
    | *Name*   | *Comment*        |
    |----------+------------------|
    | /bin     | First directory  |
    | /sbin    | Second directory |
    | /opt/bin | Third directory  |
    
    #+CALL: echo-input(l=test-table) :results value code
    
    #+RESULTS:

    Press C-c C-c on the #+CALL line to evaluate it, and you’ll see the results:

    #+RESULTS:
    #+begin_src emacs-lisp
    (("/bin" "First directory")
     ("/sbin" "Second directory")
     ("/opt/bin" "Third directory"))
    #+end_src

    First of all, note that, just as with lists, the table is converted to a list of lists of strings, where the first string in each list is the name of the directory. So we can just reuse our existing org-list-to-sh code. Secondly, org has helpfully stripped the header line and the horizontal rule underneath it, giving us a clean set of data to work with (this seems a bit fragile, however, so in your own code, be sure to sanitize your inputs). Just convert the list of directories to a table of directories, and you’re done.

    Conclusion

    We’ve seen how to convert org-mode lists and tables to code that can be inserted into a sh (or other language) source file when it’s tangled. This means that when our code includes data best represented by a list or table, we can, in the spirit of literate programming, use org-mode formatting to present that data to the user as a good-looking list or table, rather than just list it as code.

    One final homework assignment: in the list or table that describes the path elements, it would be nice to use org-mode formatting for the directory name itself: =/bin= rather than /bin. Update org-list-to-sh to strip the formatting before converting to sh code.

    A Couple of Shell Quickies

    Since I got asked several sh-related questions, I might as well get a post out of them.

    One person asks:

    I’m writing a portable shell script to download a file from the web, and then compare it to a locally-stored version of the same file, using diff.

    My first version of the script used mktemp to download the web-file to temporary file, run the diff command, and then delete the temporary file afterwards. e.g.

    TEMPFILE=$(mktemp)
    wget -q $ONLINEFILEURL -O $TEMPFILE
    diff $TEMPFILE $LOCALFILE
    rm $TEMPFILE

    However I later discovered that the BSD version of mktemp has an incompatible syntax to the GNU version of mktemp. So then I got rid of the usage of temporary files completely, by using input-redirection, e.g.

    diff <(wget -q $ONLINEFILEURL -O -) $LOCALFILE

    However while this works fine under bash and ksh, it fails under ash and sh with

    Syntax error: "(" unexpected

    to which I replied:

    The first obvious problem here is that “(wget -q $ONLINEFILEURL -O -)” isn’t a filename, it’s a subprocess. So the shell sees “<” and expects a filename, but finds “(” instead.

    It looks as though the way to get diff to read from stdin is the standard way: specify “-” as the filename, and give it input on stdin. Since you’re feeding it the output from a process, you want to use a pipe:

    wget -q $ONLINEFILEURL -O - | diff - $LOCALFILE

    I also suggested that he could try to figure out which version of mkfile he was using:

    # Wrapper function for GNU mktemp
    gnu_mktemp() {
    	mktemp /tmp/tmpfile.XXXXXX "$@"
    }
    
    # Wrapper function for BSD mktemp
    bsd_mktemp() {
    	mktemp -t /tmp/tmpfile.XXXXXX "$@"
    }
    
    # Try to figure out which wrapper to use
    if mktemp -V | grep version >/dev/null 2>&1; then
    	MKTEMP=gnu_mktemp
    else
    	MKTEMP=bsd_mktemp
    fi
    
    mytmpfile=`$MKTEMP`

    And, of course, if race conditions and security aren’t a big concern, there’s always

    mytmpfile=/tmp/myprogramname.$$

    Another person wanted to write a bash script that would do one thing when run by him or root, and another thing if run by anyone else (basically, die with an error message about insufficient privileges and/or grooviness).

    He asked whether the following two expressions were equivalent:

    • if [[ ( `whoami` != "root" ) || ( `whoami` != "coolguy" ) ]]
    • if [[ ! ( `whoami` = "root" ) || ( `whoami` = "coolguy" ) ]]

    They’re not, but maybe not for obvious reasons, because propositional logic is a harsh mistress.

    In the first expression,

    if [[ ( `whoami` != "root" ) || ( `whoami` != "coolguy" ) ]]

    let’s say that the user is joeblow. In this case, “`whoami` != "root"” is true, and so the shell can short-circuit the rest of the “||“, because the entire expression is true.

    If the user is root, then the first part, “( `whoami` != "root" )” is false. However, the second part, “( `whoami` != "coolguy" )” is true (because rootcoolguy), and so the entire expression is “false || true”, which is true.

    The second expression,

    if [[ ! ( `whoami` = "root" ) || ( `whoami` = "coolguy" ) ]]

    is closer to what he wanted, but doesn’t work because of operator precedence: “!” binds more tightly than “||“, so the expression is equivalen tto “(whoami ≠ root) || (whoami = coolguy)”.

    In this case, if the user is joeblow, the first clause, “whoami ≠ root“, is true, and so the entire expression is true.

    Worse yet, if the user is root, then neither the first nor second clause is true, so the entire clause is false.

    What he really wanted was something like:

    if [[ ( `whoami` = "root" ) || ( `whoami` = "coolguy" ) ]]; then
    	# Do nothing
    	:
    else
    	# Do something
    	echo "Go away"
    	exit 1
    fi

    Except, of course, that since the if-clause is empty, it can be cut out entirely. Then all we need to do is to negate the condition and only keep the code in the else-clause:

    if [[ ! ((`whoami` = "root" ) || ( `whoami` = "coolguy" )) ]]

    Note the extra pair of parentheses, to make sure that the “!” applies to the whole thing.

    (Update, May 18: Fixed HTML entities.)