Evil Hack of the Day

MacOS plist XML files are evil; even more so than regular XML. For instance, my iTunes library file consists mostly of entries like:

	<key>Track ID</key><integer>5436</integer>
	<key>Name</key><string>Getting Better</string>
	<key>Artist</key><string>The Beatles</string>
	<key>Composer</key><string>Paul McCartney/John Lennon</string>
	<key>Album</key><string>Sgt. Pepper's Lonely Hearts Club Band</string>

You’ll notice that there’s no connection between a key and its value, other than proximity. There’s no real indication that these are fields in a data record, and unlike most XML files, you have to consider the position of each element compared to its neighbors. It’s almost as if someone took a file of the form

Track_ID = 5436
Name = "Getting Better"
Artist = "The Beatles"
Coposer = "Paul McCartney/John Lennon"

and, when told to convert it to XML in the name of buzzword-compliance, did a simple and quarter-assed search and replace.

But of course, what was fucked up by (lossless) text substitution can be unfucked by text substitution. And what’s the buzzword-compliant tool for doing text substitution on XML, even crappy XML? XSLT, of course. The template language that combines the power of sed with the terseness of COBOL.

So I hacked up an XSLT template to convert my iTunes library into a file that can be required in a Perl script. Feel free to use it in good or ill health. If you spring it on unsuspecting developers, please send me a photo of their reaction.

Quick and Dirty Perl Hack: Is Foo::Bar Installed?

Every so often, I need to find out whether I have a certain Perl module installed. Usually it’s either because of a security alert, or because I’m wondering how much of a pain it would be to install some package that has Some::Obscure::Module as a prerequisite.

I don’t know how y’all do it, what with the plethora of package-management utilities out there, but one way that works for sure is simply:

perl -MSome::Module -e ''

If this command succeeds, that means Perl successfully loaded Some::Module, then executed the (empty) script, printing nothing. If Some::Module is missing, it’ll print an error message and fail.

This is short enough that it should be aliased, but I haven’t gotten around to that yet.

Don’t Put Information in Two Places

While playing around with a Perl script to look up stock quotes, I
kept getting warning messages about uninitialized values, as well as
mising data in the results.

I eventually tracked it down to a bug in an old version of the
Perl module, specifically to these lines:

# Yahoo uses encodes the desired fields as 1-2 character strings
# in the URL.  These are recorded below, along with their corresponding
# field names.

@FIELDS = qw/symbol name last time date net p_change volume bid ask
             close open day_range year_range eps pe div_date div div_yield
	     cap ex_div avg_vol currency/;

@FIELD_ENCODING = qw/s n l1 d1 t1 c1 p2 v b a p o m w e r r1 d y j1 q a2 c4/;

Basically, to look up a stock price at
Yahoo! Finance,
you fetch a URL with a parameter that specifies the data you want to
retrieve: s for the ticker symbol (e.g., AMZN), n
for the company name (“Amazon.com, Inc.”), and so forth.

The @FIELDS array lists convenient programmer-readable names
for the values that can be retrieved, and @FIELD_ENCODING
lists the short strings that have to be sent as part of the URL.

At this point, you should be able to make an educated guess as to what
the problem is. Take a few moments to see if you can find it.

The problem is that @FIELDS and @FIELD_ENCODING
don’t list the data in the same order: “time” is the 4th
element of @FIELDS ($FIELDS[3]), but t1,
which is used to get the time of the last quote, is the 5th element of
date is at the same position as t1.

More generally, this code has information in two different places,
which requires the programmer to remember to update it in both places
whenever a change is made. The code says “Here’s a list of names for
data. Here’s a list of strings to send to Yahoo!”, with the unstated
and unenforced assumption that “Oh, and these two lists are in
one-to-one correspondence with each other”.

Whenever you have this sort of relationship, it’s a good idea to
enforce it in the code. The obvious choice here would be a hash:

our %FIELD_MAP = (
	symbol	=> s,
	name	=> n,
	last	=> l1,

Of course, it may turn out that there are perfectly good reasons for
using an array (e.g., perhaps the server expects the data fields to be
listed in a specific order). And in my case, I don’t particularly feel
like taking the time to rewrite the entire module to use a hash
instead of two arrays. But that’s okay; we can use an array that lists
the symbols and their names:

our @FIELD_MAP = (
	[ symbol	=> s ],
	[ name	=> n ],
	[ last	=> l1 ],

We can then generate the @FIELDS and @FIELD_ENCODING
arrays from @FIELD_MAP, which allows us to use all of the old
code, while preserving both the order of the fields, and the
relationship between the URL string and the programmer-readable name:

our @FIELDS;

for my $datum (@FIELD_MAP)
	push @FIELDS,         $datum->[0];
	push @FIELD_ENCODING, $datum->[1];

With only two pieces of data, it’s okay to use arrays inside
@FIELD_MAP. If we needed more than that, we should probably
use an array of hashes:

our @FIELD_MAP = (
	{ sym_name	=> symbol,
	  url_string	=> s,
	  case_sensitive	=> 0,
	{ sym_name	=> name,
	  url_string	=> n,
	  case_sensitive	=> 1,
	{ sym_name	=> last,
	  url_string	=> l1,
	  case_sensitive	=> 0,

Over time, the amount of data stored this way may rise, and the cost
of generating useful data structures may grow too large to be done at
run-time. That’s okay: since programs can write other programs, all we
need is a utility that reads the programmer-friendly table, generates
the data structures that’ll be needed at run-time, and write those to
a separate module/header/include file. This utility can then be run at
build time, before installing the package. Or, if the data changes
over time, the utility can be run once a week (or whatever) to update
an existing installation.

The real moral of the story is that when you have a bunch of related
bits of information (the data field name and its URL string, above),
and you want to make a small change, it’s a pain to have to remember
to make the change in several places. It’s just begging for someone to
make a mistake.

Machines are good at anal-retentive manipulation of data. Let them do
the tedious, repetitive work for you.

I Get Email

Apparently, having my name in CPAN is a sign that I know everything about Perl, SOAP, XML, and security.

Unless someone can come up with a legitimate reason to send 5000 authentication requests to a web server (including an explanation of why that’s not a brain-damaged way to solve the problem at hand), I’m going to assume that this guy is a wannabe script kiddie.

This isn’t the first time someone’s asked me to , but this time around, I don’t feel like toying with him. Script kiddies are people too.

Then again, so’s Soylent Green (as put it).

Read More

Pattern Substitution as Funky Iterator

I have a project in which I have a row of cells, and a number of segments of given lengths, and I need to try out all of the ways in which the segments can fit into the row. If you like, think of it as: how many ways can “eye”, “zygote”, and “is” be placed, in that order, on a row of a Scrabble board?

I’m doing this in Perl, so naturally I’d like to play to Perl’s strengths (pattern matching and substitution) rather than its weaknesses (arithmetic). And I’ve discovered a nifty little hack.

Read More

Removing Accents in Strings

I’ve been ripping and encoding a bunch of music. Since I’m a hacker, naturally I have scripts that take a file with artist, album title, and track titles, and finds the corresponding .wav or .aiff source files, encodes them as MP3 and tags them.

A lot of the music I have is in French or German (and some Spanish and Russian), so there are accented letters in names and titles. My input files are in UTF-8 format, so that’s cool. But one problem is that of generating a filename for the MP3 files: if I want to play the song “Diogène série 87” by H.F. Thiéfaine on his album “Météo für nada”, I don’t want to have to figure out how to type those accents in the file and directory names. I want the script to pick filenames that use only ASCII characters.

Read More

How Do You Spell the Names of the Months?

One construct that I’ve seen (and used) a lot in Perl is a hash that maps month abbreviations to numeric values that can be fed to POSIX::strftime:

our %months = (
    "jan" => 0, "feb" => 1, "mar" => 3, ...

This is useful for parsing log files and such. It works, it’s quick and easy, and it doesn’t require a whole tree of dependent modules (which are always on the wrong side of the Internet) to be installed.

But what’s always bugged me is that this is the sort of thing that the machine ought to know already. And besides, it’s US-centric: what if the person running the script is in a non-English-speaking country?

Fortunately, I18N::Langinfo knows the names of the months. Read More

Time-Related Things I Never Want to See In A Perl Script Again

I got stuck debugging someone else’s Perl code today, and it was chock-full of the sorts of things that annoy the piss out of those of us who know better.

Read More

Little Languages and Tables

Recently, a coworker whipped up a Perl script that’ll build all of the
Perl modules we support. This is useful for when we add a new
supported OS or OS version. This script takes a config file, moduledefs, which lists the modules to build, as well as various quirks that affect how and whether the modules should be built. moduledefs is itself a `require‘d Perl script:

Read More