Legal Markup Language

Today at work, I had to sign some legal papers. They were pretty standard “I have read the attached policy and agree to be bound by it” stuff that all of us Full-Time Employment Units have to sign once a year.

But I’m the sort of person who believes that if I sign an agreement, I ought to at least know what it says. But I have better things to do than to read legalese all day.

The same problem applies to a lot of commercial applications: every time you upgrade, you have to agree to a EULA for the new version. For a variety of reasons, most people just click through and get on with their lives. But it would be nice to be able to know that the vendor hasn’t just asked you to sign over your firstborn.

One possible solution would be a markup and revision system for legal documents. For starters, if your job requires you to sign an Acceptable Behavior Policy once a year, you could read it carefully once when you sign on, and save a copy. Then, the next year, you can compare the version you’re given with the one you signed a year ago. If there are no changes, you can just sign it without reading, on the grounds that if you didn’t have a problem with it last year, you don’t have a problem with it now.

Of course, a lot of documents include other documents by reference. These need to be archived as well.

It would also be nice to add comments: for instance, if the policy requires you to keep your cell phone number on file with your manager so that you can be contacted outsiide of business hours, you could add “Has my cell phone changed since last year?” in the comment area.

Since big chunks of legal documents are just boilerplate text, and since many legal documents (such as software EULAs, credit card applications, car rental agreements, etc.) apply to many people, it would be nice to look them up on the net. That is, the tool on your desktop could take the MD5 hash of a clause, send that off to the legal opinion servers of your choice, and see what they have to say. For instance, the EFF could have a repository that says that certain clauses aren’t as scary as they sound; the FSF could point out which clauses will forfeit your Free Software-loving soul.

This could be a commercial service: you could pay a legal firm for online legal advice. Yes, a lawyer would have to read and research the various documents, and that’s expensive; but if they can spread the cost around several hundred or thousand clients, it could become affordable.

You should be able to specify certain details about your situation. For example, a clause that affects US Government employees might be either important or irrelevant, depending on whether you’re a fed or not. You should be able to check or uncheck “I work for the US government” in the preferences menu, so that the software will look up the appropriate response. Ditto if you don’t work at a nuclear reactor, don’t deal in foreign trade, and so forth.

One interesting aspect of this is the coding theory aspect of it: there’s a level of distrust that has to be dealt with. If you sign the yearly policy without reading it because it hasn’t changed since last year, then you probably don’t want to leave your copy of last year’s document with your employer, in case they try to change it. And if you leave a copy on your employer’s computer, they might not be above rooting around in your files to change your backup to make it match this year’s version. So you’ll want to be able to cryptographically sign each document. And of course any sensitive information that goes out on the net needs to be encrypted.

Then there’s the question of giving away information by the sorts of questions you ask. For instance, you may not want the people running Joe Random Legal Server to know that you work for the military or at a nuclear power plant, but there are common clauses that affect people who do. So while the program on your desktop needs to know this in order to give you good advice, that’s not necessarily something you need to send out on the net. So when it sends out a query about a particular clause, the protocol should allow to specify as much or as little detail as you want: if you say that you don’t deal in trade with foreign nationals, the remote server can save itself the trouble of looking up what a given clause means for those who do; but if you don’t say whether you’re in the military, it’ll send both responses back and let your desktop software decide which version to show you.

Of course, you probably don’t mind letting your attorneys know whether you’re in the military, so the software should be smart enough to send this information only to some servers and not others.

I imagine that some of this already exists: contracts are already negotiated between parties that don’t trust each other. Presumably the law firms on each side already have software that’ll tell them that section 3, paragraph 10 hasn’t changed since the last round of negotiations, so they don’t need to check it again.

And of course laws go through many iterations from original inception to bill to committee to floor vote, and are often amended by people of other parties, who’d love to make life miserable for you. The staffs of legislators must have some system for keeping track of it all. Hopefully some of it is automated.

For software, there are already software-installation tools that include presenting a EULA to the user as a standard step. It shouldn’t be too hard to put in a hook that calls the user’s preferred legal document management system.

The
Subversion version control system has a “blame” subcommand that shows you when certain lines in a file were last changed, and by whom. Legalese is so structured and formal that it seems that a similar approach should be able to help there as well.

Don’t Put Information in Two Places

While playing around with a Perl script to look up stock quotes, I
kept getting warning messages about uninitialized values, as well as
mising data in the results.

I eventually tracked it down to a bug in an old version of the
Finance::Quote
Perl module, specifically to these lines:

# Yahoo uses encodes the desired fields as 1-2 character strings
# in the URL.  These are recorded below, along with their corresponding
# field names.

@FIELDS = qw/symbol name last time date net p_change volume bid ask
             close open day_range year_range eps pe div_date div div_yield
	     cap ex_div avg_vol currency/;

@FIELD_ENCODING = qw/s n l1 d1 t1 c1 p2 v b a p o m w e r r1 d y j1 q a2 c4/;

Basically, to look up a stock price at
Yahoo! Finance,
you fetch a URL with a parameter that specifies the data you want to
retrieve: s for the ticker symbol (e.g., AMZN), n
for the company name (“Amazon.com, Inc.”), and so forth.

The @FIELDS array lists convenient programmer-readable names
for the values that can be retrieved, and @FIELD_ENCODING
lists the short strings that have to be sent as part of the URL.

At this point, you should be able to make an educated guess as to what
the problem is. Take a few moments to see if you can find it.

The problem is that @FIELDS and @FIELD_ENCODING
don’t list the data in the same order: “time” is the 4th
element of @FIELDS ($FIELDS[3]), but t1,
which is used to get the time of the last quote, is the 5th element of
@FIELD_ENCODING ($FIELD_ENCODING[4]). Likewise,
date is at the same position as t1.

More generally, this code has information in two different places,
which requires the programmer to remember to update it in both places
whenever a change is made. The code says “Here’s a list of names for
data. Here’s a list of strings to send to Yahoo!”, with the unstated
and unenforced assumption that “Oh, and these two lists are in
one-to-one correspondence with each other”.

Whenever you have this sort of relationship, it’s a good idea to
enforce it in the code. The obvious choice here would be a hash:

our %FIELD_MAP = (
	symbol	=> s,
	name	=> n,
	last	=> l1,
	…
)

Of course, it may turn out that there are perfectly good reasons for
using an array (e.g., perhaps the server expects the data fields to be
listed in a specific order). And in my case, I don’t particularly feel
like taking the time to rewrite the entire module to use a hash
instead of two arrays. But that’s okay; we can use an array that lists
the symbols and their names:

our @FIELD_MAP = (
	[ symbol	=> s ],
	[ name	=> n ],
	[ last	=> l1 ],
	…
)

We can then generate the @FIELDS and @FIELD_ENCODING
arrays from @FIELD_MAP, which allows us to use all of the old
code, while preserving both the order of the fields, and the
relationship between the URL string and the programmer-readable name:

our @FIELDS;
our @FIELD_ENCODING;

for my $datum (@FIELD_MAP)
{
	push @FIELDS,         $datum->[0];
	push @FIELD_ENCODING, $datum->[1];
}

With only two pieces of data, it’s okay to use arrays inside
@FIELD_MAP. If we needed more than that, we should probably
use an array of hashes:

our @FIELD_MAP = (
	{ sym_name	=> symbol,
	  url_string	=> s,
	  case_sensitive	=> 0,
	},
	{ sym_name	=> name,
	  url_string	=> n,
	  case_sensitive	=> 1,
	},
	{ sym_name	=> last,
	  url_string	=> l1,
	  case_sensitive	=> 0,
	},
	…
)

Over time, the amount of data stored this way may rise, and the cost
of generating useful data structures may grow too large to be done at
run-time. That’s okay: since programs can write other programs, all we
need is a utility that reads the programmer-friendly table, generates
the data structures that’ll be needed at run-time, and write those to
a separate module/header/include file. This utility can then be run at
build time, before installing the package. Or, if the data changes
over time, the utility can be run once a week (or whatever) to update
an existing installation.

The real moral of the story is that when you have a bunch of related
bits of information (the data field name and its URL string, above),
and you want to make a small change, it’s a pain to have to remember
to make the change in several places. It’s just begging for someone to
make a mistake.

Machines are good at anal-retentive manipulation of data. Let them do
the tedious, repetitive work for you.

Unicode and the Pope

I keep thinking that
Unicode
has everything, but it turns out that it doesn’t. In particular, there’s a
collection of emoji
that’s been proposed, but hasn’t been approved by the powers that be.

The reason I bring this up is that recently, the pope made some
remarkably boneheaded comments; naturally, people pointed and laughed,
because that’s what you do when someone says something embarrassingly
stupid.

In response, the Catholic News Service published
a story
chiding people for that:

ROME (CNS) — Mockery is not acceptable in public discussions, especially when the subject is the pope, said the president of the Italian Catholic bishops’ conference.
[…]

“We will not accept that the pope, in the media or anywhere else, is mocked or offended,” said Cardinal Angelo Bagnasco of Genoa, opening the spring meeting of the permanent council of the Italian bishops’ conference.

I hope that emoji proposal passes; that way, the next time something
like this happens, I’ll be able to write

<span style="font-size: 1">💚</span>

to represent the world’s smallest violin, playing for the poor little
WATBs
and their hurt feelings. To quote Bender, “Oh, wait. You’re
serious. Let me laugh even harder.”

Meanwhile, maybe someone can explain to the Catholic church that if
they don’t like being ridiculed, they shouldn’t say such ridiculous
things.

Seriously, is this the best they have left? People shouldn’t make fun
of religious people because it hurts their feelings?

Oh, and the article also says:

The pope has often urged the world to become “more God-fearing while building a society based on humanitarian values and moral principles of life,” they said.

Maybe the problem is that he’s trying to pull in opposite directions:
it’s hard to build a society “based on humanitarian values and moral
principles” while at the same time telling them to be afraid of a
magic man in the sky. Drop the fear and the superstition, and then
we’ll talk.

Another Reason Not to Write csh Scripts

In case you haven’t read Tom Christiansen’s
Csh Programming Considered Harmful,
here’s another reason not to write csh/tcsh scripts
if you can avoid it.

Unlike the Bourne Shell, the C shell exits with “Undefined variable”
if you reference an undefined variable, instead of expanding that
variable to the empty string, the way the Bourne shell and Perl do.

But there’s the $?VAR construct, which allows you to
tell whether a variable is set.

I needed to check this in a script that, for reasons I won’t go into
here, needed to be written in Csh. So I had

if ( !$?FOO ) then
    # Stuff to do if $FOO isn't set

and got an error. It turns out that csh started by evaluating
$?FOO. In this case, $FOO was set, so $?FOO
was set to 0. Csh then tried to evaluate !0 and parsed it as
“event number zero”, which failed. Grrr.

Putting a space between the bang and the dollar sign fixed that.

Mark Your Calendars

informs me that time_t (the number of seconds elapsed since Jan. 1, 1970, the standard measure of time under Unix) will be 1234567890 on Feb. 13 2009, at 18:31:30 EST, or 23:31:30 UTC.

Back on Sep. 8, 2001, when time_t rolled over to 10 digits, we were braced for a mini-Y2K. I don’t expect anything to happen this time, except for a bunch of Unix geeks hoisting beers.

Answering XKCD


The answer the character is looking for is

osascript -e "set volume output volume 100"

Of course, much like the character in the original strip, I first tried ssh-ing in to the laptop where I have this defined as an alias (nope; it’s asleep), ssh-ing in to the other Mac to see if I’d copied the alias to all Macs (that one’s asleep too), looking through backups (nope; laptop backs up to a directly-attached disk (see “it’s asleep”, above), and the other Mac’s backups came up empty), grepping through my home wiki and other notes to see if I had written this down anywhere (bupkis).

I had to wait until I came home and had physical access to the laptop to wake it up and read my .cshrc.

LOLcale

If you’re writing
Engrish,
you should probably set $LANG = en_CN.UTF-8, right?

Cleaning a Nokia N810 Keyboard

Posted in hopes that it’ll help someone somewhere:

A while ago, I spilled lemonade on my N810’s keyboard, then accidentally closed it before I could wipe it clean. I did what I could, but over the following days, the keyboard increasingly started making squeaking and crunching sounds. So I thought I’d take it apart to see if I could clean it.

It turns out that the keyboard just comes off. If you raise the stand bar (the one you need to raise to insert an SD card or open the battery cover), you’ll see three holes in the front (four, actually: the smaller one is the microphone). The three holes hold hooks that hold the keyboard in place.

Slide out the keyboard, and use a small screwdriver to push the hooks in. Lift the keyboard up a bit, and then pull it out. You’ll be left with the metallic keycaps, looking down at the actual sensors.

I washed the keycap assembly with water and a sponge, let everything dry, snapped the keyboard back into place, and presto! Good as new. the keyboard doesn’t feel crunchy anymore. Happiness ensued.

Oh, and you probably want to remove the battery before doing anything else.

Audacity Tip: Cleaning up Scratches

.figure {
display: block;
text-align: center;
}
.caption {
font-style: italic;
}

Just something I discovered recently while using Audacity to clean up some old vinyl recordings:

The Click Removal tool does a darn good job of cleaning up most scratches, but not all. IME it’s still necessary to go back after it to fix what it missed (I find that the Repair tool works well for small scratches).

Unfortunately, a lot of scratches are hard to see with the default waveform view: a scratch can have a small amplitude (smaller than the clean waveform around it); it’s annoying because it shows up as a short burst of white noise in the middle of a tune.

The black bars mark the location of an audible but invisible click.

However, white noise shows up as a vertical bar in spectrum view. So what I did was:

  1. Duplicate the track I’m cleaning up.
  2. Mute the copy.
  3. Switch the copy to spectrum view; leave the original in waveform view.

That way, you can zoom out and easily find scratches on the spectrum view. By the time you zoom in and the spectrum becomes too smeared out to be useful, you can see the scratch in the waveform view, so you can fix it.

In spectrum view, the scratch is clearly visible as a bar that goes all the way to the top.

The downside of this technique is that Audacity has to do lots of FFTs to show the spectrum. So you may want to use a fast machine for this.

As we zoom in, the spectrum becomes too smeared out to be useful, but in waveform view, the scratch becomes obvious.

The other downside, of course, is that since you can see a lot more flaws, it takes ten times longer to fix a track.

Minimal Electoral Map

During a discussion on whether the electoral college is still a good idea, someone brought up the point that it’s possible to win the electoral vote but lose the popular vote, and pretty badly at that.

So I wrote a Perl script that used evolutionary computation to try to produce the most skewed electoral map possible. Here’s what it came up with:

Electoral Vote

(click to embiggen)

Read More