If You Use Unix, Use Version Control

If you’ve used Unix (or Linux. This applies to Linux, and MacOS X, and probably various flavors of Windows as well), you’ve no doubt found yourself editing configuration files with a text editor. This is especially true if you’ve been administering a machine, either professionally or because you got roped into doing it.

And if you’ve been doing it for more than a day or two, you’ve made a mistake, and wished you could undo it, or at least see what things looked like before you started messing with them.

This is something that programmers have been dealing with for a long time, so they’ve developed an impressive array of tools to allow you to keep track of how a file has changed over time. Most of them have too much overhead for someone who doesn’t do this stuff full-time. But I’m going to talk about RCS, which is simple enough that you can just start using it.

Most programmers will tell you that RCS has severe limitations, like only being able to work on one file at a time, rather than a collection of files, that makes it unsuitable for use in all but a few special circumstances. Thankfully, Unix system administration happens to be one of those circumstances!

What’s version control?

Basically, it allows you to track changes to a file, over time. You check a file in, meaning that you want to keep track of its changes. Periodically, you check it in again, which is a bit like bookmarking a particular version so you can come back to it later. And you can check a file out, that is, retrieve it from the history archive. Or you can compare the file as it is now to how it looked one, five, or a hundred versions ago.

Note that RCS doesn’t record any changes unless you tell it to. That means that you should get into the habit of checking in your changes when you’re done messing with a file.

Starting out

Let’s create a file:

# echo "first" > myfile

Now let’s check it in, to tell RCS that we want to track it:

# ci -u myfile
myfile,v <-- myfile
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> This is a test file
>> .
initial revision: 1.1
done

ci stands for “check in”, and is RCS’s tool for checking files in. The -u option says to unlock it after checking in.

Locking is a feature of RCS that helps prevent two people from stepping on each other’s toes by editing a file at the same time. We’ll talk more about this later.

Note that I typed in This is a test file. I could have given a description on multiple lines if I wanted to, but usually you want to keep this short: “DNS config file” or “Login message”, or something similar.

End the description with a single dot on a line by itself.

You’ll notice that you now have a file called myfile,v. That’s the history file for myfile.

Since you probably don’t want ,v files lying around cluttering the place up, know that if there’s a directory called RCS, the RCS utilities will look for ,v history files in that directory. So before we get in too deep, let’s create an RCS directory:

# mkdir RCS

Now delete myfile and start from scratch, above.

Done? Good. By the way, you could also have cheated and just moved the ,v file into the RCS directory. Now you know for next time.

Making a change

All right, so now you want to make a change to your file. This happens in three steps:

  1. Check out the file and lock it.
  2. Make the change(s)
  3. Check in your changes and unlock the file.

Check out the file:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.1 (locked)
done

co is RCS’s check-out utility. In this case, it pulls the latest version out of the history archive, if it’s not there already.

The -l (lower-case ell) flag says to lock the file. This helps to prevent other people from working on the file at the same time as you. It’s still possible for other people to step on your toes, especially if they’re working as root and can overwrite anything, but it makes it a little harder. Just remember that co is almost always followed by -l.

Now let’s change the file. Edit it with your favorite editor and replace the word “first” with the word “second”.

If you want to see what has changed between the last version in history and the current version of the file, use rcsdiff:

# rcsdiff -u myfile
===================================================================
RCS file: RCS/myfile,v
retrieving revision 1.1
diff -u -r1.1 myfile
--- myfile 2016/06/07 20:18:12 1.1
+++ myfile 2016/06/07 20:32:38
@@ -1 +1 @@
-first
+second

The -u option makes it print the difference in “unified-diff” format, which I find more readable than the other possibilities. Read the man page for more options.

In unified-diff format, lines that were deleted are preceded with a minus sign, and lines that were added are preceded by a plus sign. So the above is saying that the line “first” was removed, and “second” was added.

Finally, let’s check in your change:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> Updated to second version.
>> .
done

Again, we were prompted to list the changes we made to the file (with a dot on a line by itself to mark the end of our text). You’ll want to be concise yet descriptive in this text, because these are notes you’re making for your future self when you want to go back and find out when and why a change was made.

Viewing a file’s history

Use the rlog command to see a file’s history:

# rlog myfile

RCS file: RCS/myfile,v
Working file: myfile
head: 1.2
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2; selected revisions: 2
description:
Test file.
----------------------------
revision 1.2
date: 2016/06/07 20:36:52; author: arensb; state: Exp; lines: +1 -1
Made a change.
----------------------------
revision 1.1
date: 2016/06/07 20:18:12; author: arensb; state: Exp;
Initial revision
=============================================================================

In this case, there are two revisions: 1.1, with the log message “Initial revision”, and 1.2, with the log “Made a change.”.

Undoing a change

You’ve already see rlog, which shows you a file’s history. And you’ve seen one way to use rcsdiff.

You can also use either one or two -rrevision-number arguments, to see the difference between specific revisions:

# rcsdiff -u -r1.1 myfile

will show you the difference between revision 1.1 and what’s in the file right now, and

# rcsdiff -u -r1.1 -r1.2 myfile

will show you the difference between revisions 1.1 and 1.2.

(Yes, RCS will just increment the second number in the revision number, so after a while you’ll be editing revision 1.2486 of the file. Getting to revision 2.0 is an advanced topic that we won’t cover here.)

With the tools you already have, the simplest way to revert an unintended change to a file is simply to see what the file used to look like, and copy-paste that into a new revision.

Once you’re comfortable with that, you can read the manual and read up on things like deleting revisions with rcs -o1.2 myfile.

Checking in someone else’s changes

You will inevitably run into cases where someone changes your file without going through RCS. Either it’ll be a coworker managing the same system who didn’t notice the ,v file lying around, or else you’ll forget to check in your changes after making changes.

Here’s a simple way to see whether someone (possibly you) has made changes without your knowledge:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.2 (locked)
writable myfile exists; remove it? [ny](n):

In this case, either you forgot to check in your changes, or else someone made the file writable with chmod, then (possibly) edited it.

In the former case, see what you did with rcsdiff, check in your changes, then check the file out again to do what you were going to do.

The latter case requires a bit more work, because you don’t want to lose your coworker’s changes, even though they bypassed version control.

  1. Make a copy of the file
  2. Check out the latest version of the file.
  3. Overwrite that file with your coworker’s version.
  4. Check those changes in.
  5. Check the file out and make your changes.
  6. Have a talk with your coworker about the benefits of using version control..

You already know, from the above, how to do all of this. But just to recap:

Move the file aside:

# mv myfile myfile.new

Check out the latest version:

# co -l myfile

Overwrite it with your coworker’s changes:

# mv myfile.new myfile

Check in those changes:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.3; previous revision: 1.2
enter log message, terminated with single '.' or end of file:
>> Checking in Bob's changes:.
>> Route around Internet damage.
>> .
done

That should be enough to get you started. Play around with this, and I’m sure you’ll find that this is a huge improvement over what you’ve probably been using so far: a not-system of making innumerable copies of files when you remember to, with names like “file.bob”, “file.new”, “file.new2”, “file.newer”, and other names that mean nothing to you a week later.

Disk Hack

One of the things I enjoy about Unix system administration is the McGyver aspect of it: when something goes pear-shaped, and your preferred tools aren’t available because they’re on the disk that just died, or on the other side of the pile of smoking ashes that used to be a router, you have to figure out how to recover with what you’ve got left. It’s a bit like that scene in Apollo 13 when they realize that the space capsule has a round hole for the air filter, but only square filters, and the engineer dumps all the equipment the astronauts have available onto the table, and says “We’ve got to find a way to make this fit into the hole for this, using nothing but that“.

So anyway, my mom’s Mac recently died. And, naturally, there are no available backups. But I said I’d do what I could, and took the machine home.

I’m glad we wrote off the old machine as a total loss, since it was (the tense should give you an idea of what’s coming) an iMac, one of those compact everyting-in-the-monitor models that tries oh-so-hard to fit everything into as small a space as possible. Of course, this compactness means that there’s no room to do anything: the disk is behind the LCD display, and wedged in a tight slot between the graphics card and the DVD drive. And the whole thing is wrapped in — I kid you not — foil, most likely to help control airflow. So basically I wound up ripping things out with little or no grace or elegance. If it wasn’t totaled then, it certainly is now.

At any rate, that left me with a disk that, thankfully, turned out to be unharmed. The next question was how to hook it up. I was pretty sure it had an HFS or HFS+ filesystem, which meant that the obvious thing to do would be to put it in a Mac to read. But I don’t have a Mac that I could put a second internal drive in. I toyed briefly with the idea of finding an enclosure and whatever conversion hardware would be necessary to turn an internal SATA drive into an external USB drive, but figured that was too hard for a one-shot. Then I found that my Linux box has hfs and hfsplus filesystem kernel modules, so hey.

(Of course, I don’t know how stable the Linux HFS driver is, so I figured it’d be best to write-protect the disk. At which point I discovered that this model only has one hardware jumper slot, and it doesn’t write-protect the disk. Fuck you very much, Maxtor/Apple.)

The Linux HFS driver turned out to be good enough for reading, and I could mount the disk and read files, so yay. The next question was how to get the files from there to a laptop that I could bring over to my parents’. Ths is complicated by the fact that on HFS, a file isn’t just a stream of bytes, the way it is in Unix; it has two “forks”: the data fork contains the actual data of the file (e.g., a JPEG image), and the resource fork contains metadata about the file, such as its icon, the application that should open the file by default, and so on. I didn’t want to lose that if I could help it.

The way the Linux HFS driver deals with resource forks is to create virtual or transient or whatever you want to call them files: if you open myfile, you’ll get the data fork, which looks just like any file. But if it has a resource fork, you can also open myfile/rsrc and read the contents of that. This meant that in the worst case, I 1) copy over the data forks to a directory on the Mac, then 2) find which files have resource forks (something like

find /mnt -type f |
sh -c 'while read filename; do
    if [ -s "$filename/rsrc" ]; then
        echo "$filename";
    fi;
done'

and 3) somehow re-graft the resource forks onto the files on the Mac end. But that seemed like a lot of work.

Apple software is often distributed on .dmg (disk image) files, which are mounted as virtual disks. I figured that’d be the obvious way to package up the contents of the disk. So I dded the raw disk device (/dev/sdb rather than /dev/sdb2 which was the mounted partition, in order to get the entire disk, including partition map and such), but when I tried to mount that, it didn’t work, so presumably there’s more to a .dmg file than just the raw disk data. I tried a couple of variations on that theme, but without success.

(In passing, I also noticed that rsync supports “extended attributes” on both my Mac and my Linux box, so I tried using that to copy files (with resource forks) over, only to find that the two implementations use different options to say “turn on extended attributes”, so the client couldn’t start the remote server correctly.)

Eventually, I realized that dd could be used not just to read a disk image, but to write one. Yes, I said above that reading from the disk and writing to a file didn’t produce a usable disk image. But I also said that .dmg files are mounted like disks, and that implies that there has to be a device to mount.

So on the Mac, I created a disk image file with hdiutil create, then opened it with the Disk Utility. Forcing “Verify disk” made the Disk Utility mount the image on /dev/disk2s9, just before telling me that there was no usable filesystem on the disk image. That was fine; all I wanted was for it to create a /dev/disk* device that I could write to. Then I was able to

ssh linuxbox 'dd if=/dev/sdb bs=2M' | dd bs=2048k of=/dev/disk2

to transfer the raw contents of the disk to the “disk data” portion of the disk image.

To my slight surprise, this actually worked. Yes, I had to repair the disk image, but from the log messages, that appears to be because some of the superblock copies were missing (the disk is 120Gb, but I only created a 32Gb image).

The final problem should be that of getting the disk image from my locked-down(-ish) laptop to my mom’s new vanilla Mac, but I don’t think I’ll bother. It’ll be a lot easier, and better in the long run, to put the old disk image onto an external drive that can then double as a backup disk, so I don’t have to do this again.

Because while it can be fun to solve a puzzle and figure out how to fix something with suboptimal tools, there’s also wisdom in avoiding getting into such situations in the first place.

Bourne Shell Introspection

So I was thinking about how to refactor our custom Linux and Solaris init scripts at work. The way FreeBSD does it is to have the scripts in /etc/rc.d define variables with the commands to execute, e.g.,

start_cmd='/usr/sbin/foobard'
stop_cmd='kill `cat /var/run/foobar.pid`'

run_rc_command "$1"

where $1 is “start”, “stop”, or whatever, and run_rc_command is a function loaded from an external file. It can check whether $stop_cmd is defined, and if not, take some default action.

This is great and all, but I was wondering whether it would be possible to check whether a given shell function exists. That way, a common file could implement a generic structure for starting and stopping daemons, and the daemon-specific file could just set the specifics by defining do_start and do_stop functions.

The way to do this in Perl is to iterate over the symbol table of the package you’re looking for, and seeing whether each entry is a function. The symbol table for Foo::Bar is %Foo::Bar::; for the main package, it’s %::. Thus:

while (my ($k, $v) = each %::)
{
	if (defined())
	{
		print "$k is a functionn";
	}
}

sub test_x() {}
sub test_y() {}
sub test_z() {}

But I didn’t know how to do it in the Bourne shell.

Enter type, which tells you exactly that:

#!/bin/sh

# List of all known commands
STD_CMDS="start stop restart status verify"
MORE_CMDS="graceful something_incredibly_daemon_specific"

do_start="This is a string, not a function"

do_restart() {
	echo "I ought to restart something"
}

do_graceful() {
	echo "I am so fucking graceful"
}

for cmd in ${STD_CMDS} ${MORE_CMDS}; do
	if type "do_$cmd" >/dev/null 2>&1; then
		echo "* do_$cmd is defined"
	else
		echo "- do_$cmd is not defined"
	fi
done

And yes, this works not just in bash, but in the traditional, bourne-just-once shell, on every platform that I care about.

So yay, it turns out that the Bourne shell has more introspection than
I thought.