Geek, Epsilon Clue, Page 7

Geek Religion

iReligion 2.0

The Telegraph
reports that

Cardinal Sean Brady, the leader of Ireland’s Roman Catholics, has urged social network users to start sending daily prayers by text, Twitter or e-mail.

This, of course, could be the start of something huge: if tweeted
prayers are as good as spoken ones (contest for the comments section:
condense the Lord’s Prayer into 140 characters), then the sky’s the
limit.

Imagine: you add a dinner date on your PDA. When it gets added to your
calendar server, it sends a request to the Catholic church’s server,
with the XML equivalent of “forgive me, father, for I have committed
gluttony”. The church’s expert system analyzes this request behind the
scenes, and responds with something like “say five Hail Marys”
(properly encapsulated in XML, of course). Your home computer then
schedules a time to tweet five Hail Marys while you sleep.

At MyVatican, you can view your history confession, schedule
preemptive penance, friend saints and other intercessors, buy relics
at the online shop, and follow your favorite priests as they get
shuffled from one parish to the next.

What would be really cool would be if they wikified the
Catholic Encyclopedia.
Though I assume that ^{[citation needed]} would be
replaced by ^{[must be taken on faith]}.

Andrew Arensburger

Apr, Tue, 2009

Hacking

Just A Little Bit of Planning

One thing I’ve noticed about my code is that an awful lot of the
comments are of the form

call_some_function();
	// XXX - Error-checking

(where
XXX
is an easily-grepped
marker for something that needs to be fixed.)

The proximate reason for this accumulation of “do something smart if
something goes wrong to-do items is that a lot of the time, the
function in which this appears doesn’t have a mechanism for reporting
errors, so I don’t know what to do if I detected a run-time error.

This leads to the other big problem, the one where I’m calling another
function of mine, which doesn’t report errors, so I can’t even tell if
something went wrong.

So if a() calls b() which calls
c(), all three are likely to have
XXX - Error-checking
comments; but c() doesn’t know what to do in case of en
error, and b() and a() don’t even know how
to detect errors. And so the XXX comments accumulate.

For me, this is often caused by the fact that experimental programming
and production programming are quite different: when I’m learning a
new system, such as a new graphics or math library, I want to figure
out which functions I’m supposed to call to get the results I want,
what the various data structures do, which one of multiple approaches
to the problem works best, and so forth.

If I set up a test environment (e.g., a database server so I can play
around with database-manipulation code), I’ll be keeping an eye on it
to make sure that everything is sane, so my experimental code needn’t
worry about checking whether the server is up. And if it can’t make a
connection, it’ll likely dump core; but since I don’t have any
precious data or users who’ll yell at me if things go wrong, it
doesn’t matter. It’s best to just set up a working environment and
hammer at the code until it works. I tend to accumulate a large number
of ad hoc modules, functions, data files with names like
foo, bar, foo2, and so forth.

In production, of course, this isn’t good enough: all sorts of things
can go wrong: servers go down, network connections get broken, runaway
processes suck up all available memory, users try to open nonexistent
files, viruses try to overflow buffers, and so forth. Code needs not
only to detect errors, but deal with them as gracefully as possible.

But if I’m working on a larger project, and working on adding new
functionality that I’m not familar with, the temptation is strong to
take the “learning” code that’s been hammered into some kind of shape,
and plop it in the middle of existing production code.

Neither approach is inherently wrong. Each is appropriate in certain
contexts. You want to keep things loose and fluid and unstructured
while learning, because by definition you don’t know what’s going and
what’s best. And you want to have things organized, structured, and
regimented in production, to make it easier to avoid and find bugs, to
ensure code quality and stability.

But this difference does mean that it can be hard to integrate new
code into production code.

Often, test code is so messy that there’s really no choice but to do a
complete rewrite. Neater test code is worse in this regard, since it
may trick you into thinking that with just a little bit of cleaning
up, you can use it in production.

So it’s important, when moving from test to production code, to ask
oneself how errors should be reported. This takes a bit more planning
ahead of time, but like security, it’s easier to build it in from the
start than to retrofit it onto existing code.

The thing that works for me is:

Think of a function to add.
Write a comment describing the function: what it does, which
arguments it takes, what value(s) it returns, and what it does in case
of error.
Write the function.

In that order.

This may be more structured than you want in the playing-around phase,
in which case you should definitely consider using it during the
hammering-into-shape phase, after you have the basic functionality
working, and before you’ve started moving test code into production.
Over time, though, this kind of forethought may become automatic
enough that it doesn’t get much in the way of experimentation.

Andrew Arensburger

Apr, Sat, 2009

Geek Miscellaneous

Legal Markup Language

Today at work, I had to sign some legal papers. They were pretty standard “I have read the attached policy and agree to be bound by it” stuff that all of us Full-Time Employment Units have to sign once a year.

But I’m the sort of person who believes that if I sign an agreement, I ought to at least know what it says. But I have better things to do than to read legalese all day.

The same problem applies to a lot of commercial applications: every time you upgrade, you have to agree to a EULA for the new version. For a variety of reasons, most people just click through and get on with their lives. But it would be nice to be able to know that the vendor hasn’t just asked you to sign over your firstborn.

One possible solution would be a markup and revision system for legal documents. For starters, if your job requires you to sign an Acceptable Behavior Policy once a year, you could read it carefully once when you sign on, and save a copy. Then, the next year, you can compare the version you’re given with the one you signed a year ago. If there are no changes, you can just sign it without reading, on the grounds that if you didn’t have a problem with it last year, you don’t have a problem with it now.

Of course, a lot of documents include other documents by reference. These need to be archived as well.

It would also be nice to add comments: for instance, if the policy requires you to keep your cell phone number on file with your manager so that you can be contacted outsiide of business hours, you could add “Has my cell phone changed since last year?” in the comment area.

Since big chunks of legal documents are just boilerplate text, and since many legal documents (such as software EULAs, credit card applications, car rental agreements, etc.) apply to many people, it would be nice to look them up on the net. That is, the tool on your desktop could take the MD5 hash of a clause, send that off to the legal opinion servers of your choice, and see what they have to say. For instance, the EFF could have a repository that says that certain clauses aren’t as scary as they sound; the FSF could point out which clauses will forfeit your Free Software-loving soul.

This could be a commercial service: you could pay a legal firm for online legal advice. Yes, a lawyer would have to read and research the various documents, and that’s expensive; but if they can spread the cost around several hundred or thousand clients, it could become affordable.

You should be able to specify certain details about your situation. For example, a clause that affects US Government employees might be either important or irrelevant, depending on whether you’re a fed or not. You should be able to check or uncheck “I work for the US government” in the preferences menu, so that the software will look up the appropriate response. Ditto if you don’t work at a nuclear reactor, don’t deal in foreign trade, and so forth.

One interesting aspect of this is the coding theory aspect of it: there’s a level of distrust that has to be dealt with. If you sign the yearly policy without reading it because it hasn’t changed since last year, then you probably don’t want to leave your copy of last year’s document with your employer, in case they try to change it. And if you leave a copy on your employer’s computer, they might not be above rooting around in your files to change your backup to make it match this year’s version. So you’ll want to be able to cryptographically sign each document. And of course any sensitive information that goes out on the net needs to be encrypted.

Then there’s the question of giving away information by the sorts of questions you ask. For instance, you may not want the people running Joe Random Legal Server to know that you work for the military or at a nuclear power plant, but there are common clauses that affect people who do. So while the program on your desktop needs to know this in order to give you good advice, that’s not necessarily something you need to send out on the net. So when it sends out a query about a particular clause, the protocol should allow to specify as much or as little detail as you want: if you say that you don’t deal in trade with foreign nationals, the remote server can save itself the trouble of looking up what a given clause means for those who do; but if you don’t say whether you’re in the military, it’ll send both responses back and let your desktop software decide which version to show you.

Of course, you probably don’t mind letting your attorneys know whether you’re in the military, so the software should be smart enough to send this information only to some servers and not others.

I imagine that some of this already exists: contracts are already negotiated between parties that don’t trust each other. Presumably the law firms on each side already have software that’ll tell them that section 3, paragraph 10 hasn’t changed since the last round of negotiations, so they don’t need to check it again.

And of course laws go through many iterations from original inception to bill to committee to floor vote, and are often amended by people of other parties, who’d love to make life miserable for you. The staffs of legislators must have some system for keeping track of it all. Hopefully some of it is automated.

For software, there are already software-installation tools that include presenting a EULA to the user as a standard step. It shouldn’t be too hard to put in a hook that calls the user’s preferred legal document management system.

The
Subversion version control system has a “blame” subcommand that shows you when certain lines in a file were last changed, and by whom. Legalese is so structured and formal that it seems that a similar approach should be able to help there as well.

Andrew Arensburger

Apr, Thu, 2009

Hacking Perl

Don’t Put Information in Two Places

While playing around with a Perl script to look up stock quotes, I
kept getting warning messages about uninitialized values, as well as
mising data in the results.

I eventually tracked it down to a bug in an old version of the
Finance::Quote
Perl module, specifically to these lines:

# Yahoo uses encodes the desired fields as 1-2 character strings
# in the URL.  These are recorded below, along with their corresponding
# field names.

@FIELDS = qw/symbol name last time date net p_change volume bid ask
             close open day_range year_range eps pe div_date div div_yield
	     cap ex_div avg_vol currency/;

@FIELD_ENCODING = qw/s n l1 d1 t1 c1 p2 v b a p o m w e r r1 d y j1 q a2 c4/;

Basically, to look up a stock price at
Yahoo! Finance,
you fetch a URL with a parameter that specifies the data you want to
retrieve: s for the ticker symbol (e.g., AMZN), n
for the company name (“Amazon.com, Inc.”), and so forth.

The @FIELDS array lists convenient programmer-readable names
for the values that can be retrieved, and @FIELD_ENCODING
lists the short strings that have to be sent as part of the URL.

At this point, you should be able to make an educated guess as to what
the problem is. Take a few moments to see if you can find it.

…

The problem is that @FIELDS and @FIELD_ENCODING
don’t list the data in the same order: “time” is the 4th
element of @FIELDS ($FIELDS[3]), but t1,
which is used to get the time of the last quote, is the 5th element of
@FIELD_ENCODING ($FIELD_ENCODING[4]). Likewise,
date is at the same position as t1.

More generally, this code has information in two different places,
which requires the programmer to remember to update it in both places
whenever a change is made. The code says “Here’s a list of names for
data. Here’s a list of strings to send to Yahoo!”, with the unstated
and unenforced assumption that “Oh, and these two lists are in
one-to-one correspondence with each other”.

Whenever you have this sort of relationship, it’s a good idea to
enforce it in the code. The obvious choice here would be a hash:

our %FIELD_MAP = (
	symbol	=> s,
	name	=> n,
	last	=> l1,
	…
)

Of course, it may turn out that there are perfectly good reasons for
using an array (e.g., perhaps the server expects the data fields to be
listed in a specific order). And in my case, I don’t particularly feel
like taking the time to rewrite the entire module to use a hash
instead of two arrays. But that’s okay; we can use an array that lists
the symbols and their names:

our @FIELD_MAP = (
	[ symbol	=> s ],
	[ name	=> n ],
	[ last	=> l1 ],
	…
)

We can then generate the @FIELDS and @FIELD_ENCODING
arrays from @FIELD_MAP, which allows us to use all of the old
code, while preserving both the order of the fields, and the
relationship between the URL string and the programmer-readable name:

our @FIELDS;
our @FIELD_ENCODING;

for my $datum (@FIELD_MAP)
{
	push @FIELDS,         $datum->[0];
	push @FIELD_ENCODING, $datum->[1];
}

With only two pieces of data, it’s okay to use arrays inside
@FIELD_MAP. If we needed more than that, we should probably
use an array of hashes:

our @FIELD_MAP = (
	{ sym_name	=> symbol,
	  url_string	=> s,
	  case_sensitive	=> 0,
	},
	{ sym_name	=> name,
	  url_string	=> n,
	  case_sensitive	=> 1,
	},
	{ sym_name	=> last,
	  url_string	=> l1,
	  case_sensitive	=> 0,
	},
	…
)

Over time, the amount of data stored this way may rise, and the cost
of generating useful data structures may grow too large to be done at
run-time. That’s okay: since programs can write other programs, all we
need is a utility that reads the programmer-friendly table, generates
the data structures that’ll be needed at run-time, and write those to
a separate module/header/include file. This utility can then be run at
build time, before installing the package. Or, if the data changes
over time, the utility can be run once a week (or whatever) to update
an existing installation.

The real moral of the story is that when you have a bunch of related
bits of information (the data field name and its URL string, above),
and you want to make a small change, it’s a pain to have to remember
to make the change in several places. It’s just begging for someone to
make a mistake.

Machines are good at anal-retentive manipulation of data. Let them do
the tedious, repetitive work for you.

Andrew Arensburger

Apr, Sun, 2009

FFS Geek Religion

Unicode and the Pope

I keep thinking that
Unicode
has everything, but it turns out that it doesn’t. In particular, there’s a
collection of emoji
that’s been proposed, but hasn’t been approved by the powers that be.

The reason I bring this up is that recently, the pope made some
remarkably boneheaded comments; naturally, people pointed and laughed,
because that’s what you do when someone says something embarrassingly
stupid.

In response, the Catholic News Service published
a story
chiding people for that:

ROME (CNS) — Mockery is not acceptable in public discussions, especially when the subject is the pope, said the president of the Italian Catholic bishops’ conference.
[…]

“We will not accept that the pope, in the media or anywhere else, is mocked or offended,” said Cardinal Angelo Bagnasco of Genoa, opening the spring meeting of the permanent council of the Italian bishops’ conference.

I hope that emoji proposal passes; that way, the next time something
like this happens, I’ll be able to write

<span style="font-size: 1">💚</span>

to represent the world’s smallest violin, playing for the poor little
WATBs
and their hurt feelings. To quote Bender, “Oh, wait. You’re
serious. Let me laugh even harder.”

Meanwhile, maybe someone can explain to the Catholic church that if
they don’t like being ridiculed, they shouldn’t say such ridiculous
things.

Seriously, is this the best they have left? People shouldn’t make fun
of religious people because it hurts their feelings?

Oh, and the article also says:

The pope has often urged the world to become “more God-fearing while building a society based on humanitarian values and moral principles of life,” they said.

Maybe the problem is that he’s trying to pull in opposite directions:
it’s hard to build a society “based on humanitarian values and moral
principles” while at the same time telling them to be afraid of a
magic man in the sky. Drop the fear and the superstition, and then
we’ll talk.

Andrew Arensburger

Mar, Tue, 2009

Hacking

Another Reason Not to Write csh Scripts

In case you haven’t read Tom Christiansen’s
Csh Programming Considered Harmful,
here’s another reason not to write csh/tcsh scripts
if you can avoid it.

Unlike the Bourne Shell, the C shell exits with “Undefined variable”
if you reference an undefined variable, instead of expanding that
variable to the empty string, the way the Bourne shell and Perl do.

But there’s the $?VAR construct, which allows you to
tell whether a variable is set.

I needed to check this in a script that, for reasons I won’t go into
here, needed to be written in Csh. So I had

if ( !$?FOO ) then
    # Stuff to do if $FOO isn't set

and got an error. It turns out that csh started by evaluating
$?FOO. In this case, $FOO was set, so $?FOO
was set to 0. Csh then tried to evaluate !0 and parsed it as
“event number zero”, which failed. Grrr.

Putting a space between the bang and the dollar sign fixed that.

Andrew Arensburger

Feb, Wed, 2009

Geek

Mark Your Calendars

informs me that time_t (the number of seconds elapsed since Jan. 1, 1970, the standard measure of time under Unix) will be 1234567890 on Feb. 13 2009, at 18:31:30 EST, or 23:31:30 UTC.

Back on Sep. 8, 2001, when time_t rolled over to 10 digits, we were braced for a mini-Y2K. I don’t expect anything to happen this time, except for a bunch of Unix geeks hoisting beers.

Andrew Arensburger

Jan, Thu, 2009

Geek

Answering XKCD

The answer the character is looking for is

osascript -e "set volume output volume 100"

Of course, much like the character in the original strip, I first tried ssh-ing in to the laptop where I have this defined as an alias (nope; it’s asleep), ssh-ing in to the other Mac to see if I’d copied the alias to all Macs (that one’s asleep too), looking through backups (nope; laptop backs up to a directly-attached disk (see “it’s asleep”, above), and the other Mac’s backups came up empty), grepping through my home wiki and other notes to see if I had written this down anywhere (bupkis).

I had to wait until I came home and had physical access to the laptop to wake it up and read my .cshrc.

Andrew Arensburger

Jan, Wed, 2009

Geek

LOLcale

If you’re writing
Engrish,
you should probably set $LANG = en_CN.UTF-8, right?

Andrew Arensburger

Jan, Fri, 2009

Geek Things I've Learned

Cleaning a Nokia N810 Keyboard

Posted in hopes that it’ll help someone somewhere:

A while ago, I spilled lemonade on my N810’s keyboard, then accidentally closed it before I could wipe it clean. I did what I could, but over the following days, the keyboard increasingly started making squeaking and crunching sounds. So I thought I’d take it apart to see if I could clean it.

It turns out that the keyboard just comes off. If you raise the stand bar (the one you need to raise to insert an SD card or open the battery cover), you’ll see three holes in the front (four, actually: the smaller one is the microphone). The three holes hold hooks that hold the keyboard in place.

Slide out the keyboard, and use a small screwdriver to push the hooks in. Lift the keyboard up a bit, and then pull it out. You’ll be left with the metallic keycaps, looking down at the actual sensors.

I washed the keycap assembly with water and a sponge, let everything dry, snapped the keyboard back into place, and presto! Good as new. the keyboard doesn’t feel crunchy anymore. Happiness ensued.

Oh, and you probably want to remove the battery before doing anything else.

Andrew Arensburger

Jan, Tue, 2009

Epsilon Clue

Epsilon Clue

Category Geek