Just A Little Bit of Planning

One thing I’ve noticed about my code is that an awful lot of the
comments are of the form

call_some_function();
	// XXX - Error-checking

(where
XXX
is an easily-grepped
marker for something that needs to be fixed.)

The proximate reason for this accumulation of “do something smart if
something goes wrong to-do items is that a lot of the time, the
function in which this appears doesn’t have a mechanism for reporting
errors, so I don’t know what to do if I detected a run-time error.

This leads to the other big problem, the one where I’m calling another
function of mine, which doesn’t report errors, so I can’t even tell if
something went wrong.

So if a() calls b() which calls
c(), all three are likely to have
XXX - Error-checking
comments; but c() doesn’t know what to do in case of en
error, and b() and a() don’t even know how
to detect errors. And so the XXX comments accumulate.

For me, this is often caused by the fact that experimental programming
and production programming are quite different: when I’m learning a
new system, such as a new graphics or math library, I want to figure
out which functions I’m supposed to call to get the results I want,
what the various data structures do, which one of multiple approaches
to the problem works best, and so forth.

If I set up a test environment (e.g., a database server so I can play
around with database-manipulation code), I’ll be keeping an eye on it
to make sure that everything is sane, so my experimental code needn’t
worry about checking whether the server is up. And if it can’t make a
connection, it’ll likely dump core; but since I don’t have any
precious data or users who’ll yell at me if things go wrong, it
doesn’t matter. It’s best to just set up a working environment and
hammer at the code until it works. I tend to accumulate a large number
of ad hoc modules, functions, data files with names like
foo, bar, foo2, and so forth.

In production, of course, this isn’t good enough: all sorts of things
can go wrong: servers go down, network connections get broken, runaway
processes suck up all available memory, users try to open nonexistent
files, viruses try to overflow buffers, and so forth. Code needs not
only to detect errors, but deal with them as gracefully as possible.

But if I’m working on a larger project, and working on adding new
functionality that I’m not familar with, the temptation is strong to
take the “learning” code that’s been hammered into some kind of shape,
and plop it in the middle of existing production code.

Neither approach is inherently wrong. Each is appropriate in certain
contexts. You want to keep things loose and fluid and unstructured
while learning, because by definition you don’t know what’s going and
what’s best. And you want to have things organized, structured, and
regimented in production, to make it easier to avoid and find bugs, to
ensure code quality and stability.

But this difference does mean that it can be hard to integrate new
code into production code.

Often, test code is so messy that there’s really no choice but to do a
complete rewrite. Neater test code is worse in this regard, since it
may trick you into thinking that with just a little bit of cleaning
up, you can use it in production.

So it’s important, when moving from test to production code, to ask
oneself how errors should be reported. This takes a bit more planning
ahead of time, but like security, it’s easier to build it in from the
start than to retrofit it onto existing code.

The thing that works for me is:

  1. Think of a function to add.
  2. Write a comment describing the function: what it does, which
    arguments it takes, what value(s) it returns, and what it does in case
    of error.
  3. Write the function.

In that order.

This may be more structured than you want in the playing-around phase,
in which case you should definitely consider using it during the
hammering-into-shape phase, after you have the basic functionality
working, and before you’ve started moving test code into production.
Over time, though, this kind of forethought may become automatic
enough that it doesn’t get much in the way of experimentation.

Don’t Put Information in Two Places

While playing around with a Perl script to look up stock quotes, I
kept getting warning messages about uninitialized values, as well as
mising data in the results.

I eventually tracked it down to a bug in an old version of the
Finance::Quote
Perl module, specifically to these lines:

# Yahoo uses encodes the desired fields as 1-2 character strings
# in the URL.  These are recorded below, along with their corresponding
# field names.

@FIELDS = qw/symbol name last time date net p_change volume bid ask
             close open day_range year_range eps pe div_date div div_yield
	     cap ex_div avg_vol currency/;

@FIELD_ENCODING = qw/s n l1 d1 t1 c1 p2 v b a p o m w e r r1 d y j1 q a2 c4/;

Basically, to look up a stock price at
Yahoo! Finance,
you fetch a URL with a parameter that specifies the data you want to
retrieve: s for the ticker symbol (e.g., AMZN), n
for the company name (“Amazon.com, Inc.”), and so forth.

The @FIELDS array lists convenient programmer-readable names
for the values that can be retrieved, and @FIELD_ENCODING
lists the short strings that have to be sent as part of the URL.

At this point, you should be able to make an educated guess as to what
the problem is. Take a few moments to see if you can find it.

The problem is that @FIELDS and @FIELD_ENCODING
don’t list the data in the same order: “time” is the 4th
element of @FIELDS ($FIELDS[3]), but t1,
which is used to get the time of the last quote, is the 5th element of
@FIELD_ENCODING ($FIELD_ENCODING[4]). Likewise,
date is at the same position as t1.

More generally, this code has information in two different places,
which requires the programmer to remember to update it in both places
whenever a change is made. The code says “Here’s a list of names for
data. Here’s a list of strings to send to Yahoo!”, with the unstated
and unenforced assumption that “Oh, and these two lists are in
one-to-one correspondence with each other”.

Whenever you have this sort of relationship, it’s a good idea to
enforce it in the code. The obvious choice here would be a hash:

our %FIELD_MAP = (
	symbol	=> s,
	name	=> n,
	last	=> l1,
	…
)

Of course, it may turn out that there are perfectly good reasons for
using an array (e.g., perhaps the server expects the data fields to be
listed in a specific order). And in my case, I don’t particularly feel
like taking the time to rewrite the entire module to use a hash
instead of two arrays. But that’s okay; we can use an array that lists
the symbols and their names:

our @FIELD_MAP = (
	[ symbol	=> s ],
	[ name	=> n ],
	[ last	=> l1 ],
	…
)

We can then generate the @FIELDS and @FIELD_ENCODING
arrays from @FIELD_MAP, which allows us to use all of the old
code, while preserving both the order of the fields, and the
relationship between the URL string and the programmer-readable name:

our @FIELDS;
our @FIELD_ENCODING;

for my $datum (@FIELD_MAP)
{
	push @FIELDS,         $datum->[0];
	push @FIELD_ENCODING, $datum->[1];
}

With only two pieces of data, it’s okay to use arrays inside
@FIELD_MAP. If we needed more than that, we should probably
use an array of hashes:

our @FIELD_MAP = (
	{ sym_name	=> symbol,
	  url_string	=> s,
	  case_sensitive	=> 0,
	},
	{ sym_name	=> name,
	  url_string	=> n,
	  case_sensitive	=> 1,
	},
	{ sym_name	=> last,
	  url_string	=> l1,
	  case_sensitive	=> 0,
	},
	…
)

Over time, the amount of data stored this way may rise, and the cost
of generating useful data structures may grow too large to be done at
run-time. That’s okay: since programs can write other programs, all we
need is a utility that reads the programmer-friendly table, generates
the data structures that’ll be needed at run-time, and write those to
a separate module/header/include file. This utility can then be run at
build time, before installing the package. Or, if the data changes
over time, the utility can be run once a week (or whatever) to update
an existing installation.

The real moral of the story is that when you have a bunch of related
bits of information (the data field name and its URL string, above),
and you want to make a small change, it’s a pain to have to remember
to make the change in several places. It’s just begging for someone to
make a mistake.

Machines are good at anal-retentive manipulation of data. Let them do
the tedious, repetitive work for you.

Another Reason Not to Write csh Scripts

In case you haven’t read Tom Christiansen’s
Csh Programming Considered Harmful,
here’s another reason not to write csh/tcsh scripts
if you can avoid it.

Unlike the Bourne Shell, the C shell exits with “Undefined variable”
if you reference an undefined variable, instead of expanding that
variable to the empty string, the way the Bourne shell and Perl do.

But there’s the $?VAR construct, which allows you to
tell whether a variable is set.

I needed to check this in a script that, for reasons I won’t go into
here, needed to be written in Csh. So I had

if ( !$?FOO ) then
    # Stuff to do if $FOO isn't set

and got an error. It turns out that csh started by evaluating
$?FOO. In this case, $FOO was set, so $?FOO
was set to 0. Csh then tried to evaluate !0 and parsed it as
“event number zero”, which failed. Grrr.

Putting a space between the bang and the dollar sign fixed that.

Wanted: Calendar Feature

PDAs have solved or simplified a lot of the problems I used to have
before I started carrying around a backup brain. But there’s one type
of reminder that they still can’t deal with: “do X under when Y
happens”. E.g., “Return Paul’s book next time I see him” or “Look up
Janice if I’m ever in London.”

Read More

L10n 2.0

If you write a software package, and want it to be usable by as many people as possible, it’s important to translate it into other languages. But like documentation, localization (l10n) is one of those chores that programmers don’t want to do. But if it’s a web app, why not ask the users to contribute translations?

Read More

Unicode Input in Emacs

One question that had been bugging me for a while is, how does one
input a character in Emacs, given its Unicode hex code?

Answer: use the ucs input method, then use
uHHHH to input, where HHHH is the character’s
hex code.

Unfortunately, it doesn’t look as though there’s a way to input a
character by its decimal code.

Also, C- toggles an input method on and off, rather than
cycling through a list. So if you’re writing HTML (and therefore want
the default input method) with French text (for which you want the
latin-1-postfix input method), but need to insert box-drawing
characters (for which you need ucs), you’ll wind up using
M-x set-input-method a lot.

A Better Way to Toggle

(Warning: what follows may be obvious or trivial to many.)

One of the cool things about AJAX is switching parts on and off: you
can make an element visible simply by

myElement.style.display = "block";

or hide it with

myElement.style.display = "none";

But the problem with this is that it requires the JavaScript script to
know a lot about the document. The example above doesn’t look too bad,
but what if you have something like a pulldown menu that appears when
you click a button?

Let’s say that originally, the button is gray and has a “+” icon next
to the text. When you click on it, the menu becomes visible, but the
button also changes to red, and the “+” icon changes to “-“, to show
that the menu is active.

Now you have all sorts of CSS resources that you have to keep track
of. It would be nice to put them in the .css file, with the
rest of the style stuff.

Read More

Different Stylesheets for Browsers With and Without JavaScript

As hacks go, this one is pretty obvious, but I thought I’d throw it
out there anyway.

Let’s say there are three stylesheets you want to use on your web
page: one for all browsers (style.css), one for browsers with
JavaScript enabled (style-js.css), one for browsers without
JavaScript (style-nojs.css). This can be useful for things
like “display the fancy drop-down menu only if the browser supports
JavaScript; display the plain-HTML menu only if the browser doesn’t
support JavaScript”.

The common stylesheet is pretty standard:

<link rel="stylesheet" type="text/css" href="style.css"/>

The one for browsers that don’t support JavaScript is also pretty
easy: that’s what <noscript> is for:

<noscript>
  <link rel="stylesheet" type="text/css" href="style-nojs.css"/>
</noscript>

Finally, what’s the best way to have different behavior in browsers
that support JavaScript? Why, run a script, of course:


  document.write('n');

I Get Email

Apparently, having my name in CPAN is a sign that I know everything about Perl, SOAP, XML, and security.

Unless someone can come up with a legitimate reason to send 5000 authentication requests to a web server (including an explanation of why that’s not a brain-damaged way to solve the problem at hand), I’m going to assume that this guy is a wannabe script kiddie.

This isn’t the first time someone’s asked me to , but this time around, I don’t feel like toying with him. Script kiddies are people too.

Then again, so’s Soylent Green (as put it).

Read More

Further Thoughts on Musical Evo-Devo

I keep thinking off and on about writing a program to apply evo-devo to musical composition (part 2 ). There are tons of problems to solve, of course, but I may have made some progress.

Read More