There Are Days When I Hate XML

…and days when I really hate XML.

In this case, I have an XML document like

<?xml version="1.0" ?>
<foo xmlns="http://some.org/foo">
  <thing id="first"/>
  <thing id="second"/>
  <thing id="third"/>
</foo>

and I want to get things out of it with XPath.

But I couldn’t manage to select things exactly, such as the <foo> element at the top. You’d think “/foo” would do it, but it didn’t.

Eventually I found out that the problem is the xmlns=”...” attribute. It looks perfectly normal, saying that if you have <thing> without a prefix (like “<ns:thing>”), then it’s in the “http://some.org/foo” namespace.

However, in XPath, if you specify “ns:thing”, it means “a thing element in whichever namespace the ns prefix corresponds to”. BUT “thing” means “a thing element that’s not in a namespace”.

So how do you specify an element that’s the empty-string namespace, as above? The obvious way would be to select “:thing”, but that doesn’t work. Too simple, I suppose. Maybe that gets confused with CSS pseudo-selectors or something.

No, apparently the thing you need to do is to invent a prefix for the standard elements of the file you’re parsing. That is, add “ns” as another prefix that maps onto “http://some.org/foo” and then select “ns:thing”. There are different ways of doing this, depending which library you’re using to make XPath queries, but still, it seems like a giant pain in the ass.

Would You Get Off the Plane?

There’s an old joke about an instructor who asks, “if you were on a plane, and found out that the inflight systems were controlled by a beta version of software written by your team, would you get off the plane?” Most of the students pale and say yes. One person says, “No. If it were written by my team, it wouldn’t make it to the runway.”

The implication is that most software is far crappier than people realize, and that programmers are keenly aware of this. But I’d like to present another reason for getting off the plane:

Even if my team writes good code, the fact that it’s still in beta means that it hasn’t been fully tested. That means that there must have been some kind of snafu at the airline and/or the software company for untested software to be put in control of a flight with ordinary passengers. And if management permitted this error, what other problems might there be? Maybe the fuel tanks are only half-full. Maybe the landing gear hasn’t been inspected.

DD-WRT Config Backups

So the other day I managed to hose my DD-WRT configuration at home, badly enough that I figured I really ought to back up my config so I don’t wind up trying to reconstruct my config from memory.

(If you just want to see the script, you can jump ahead.)

I have nightly backups of my desktop (and so do you, right? Right?!) so I figured the simplest thing is just to copy the router’s config to my desktop, and let it be swept up with the nightly backups. So then it’s just a question of getting the configuration and such to my desktop.

In DD-WRT, under “Administration → Backup”, you can do manual backups and restores. This is what we want, except that we want this to happen automatically.

The trivial way to get the config is

curl -O -u admin:$PASSWORD http://$ROUTER/nvrambak.bin

or

wget --http-user=admin --http-password=$PASSWORD http://$ROUTER/nvrambak.bin

where $ROUTER is your router’s name or IP address, and $PASSWORD is your admin password. But this sends your password over the network in the clear, so it’s not at all secure.

Instead, let’s invest some time into setting up ssh on the router; specifically, the public key method that’ll allow passwordless login.

Go follow the instructions there. When you come back, you should have working ssh, and a key file that we’ll use for backups.

Back? Good. Now you can log in to the router with

ssh root@$ROUTER

and entering your password. Or you can log in without a password with

ssh -i /path/to/key-file root@$ROUTER

Once on the router, you can save a copy of the configuration with nvram backup /tmp/nvrambak.bin. So let’s combine those last two parts with:

ssh -i /path/to/key-file root@$ROUTER "nvram backup /tmp/nvrambak.bin"

And finally, let’s copy that file from the router to the desktop:

scp -i /path/to/key-file root@$ROUTER:/tmp/nvrambak.bin /path/to/local/nvrambak.bin

So the simple script becomes:

#!/bin/sh
SSH=/usr/bin/ssh
SCP=/usr/bin/scp
RSA_ID=/path/to/keyfile_rsa
ROUTER=router name or IP address
LOCALFILE=/path/to/local/nvrambak.bin

$SSH -4 -q ${RSA_ID} root@${ROUTER} "nvram backup /tmp/nbrambak.bin"
$SCP -4 -q ${RSA_ID} root@${ROUTER}:/tmp/nbrambak.bin ${LOCALFILE}

I’ve put this in the “run this before starting a backup” section of my backup utility. But you can also make it a daily cron job. Either way, you should wind up with backups of your router’s config.

I Don’t Want Flying Cars; I Just Want Working Bluetooth

I love Bluetooth. I love that it’s supported on all my various electronic gadgets, and lets them talk to each other and exchange information, be it streaming audio data, or a text note, or what have you.

Or at least I love the idea of Bluetooth. The unfortunate reality is that the implementations that I’ve seen never quite live up to the ideal.

For instances, it often takes several attempts to pair two devices, even when they’re two feet from each other. Sometimes devices disconnect for no obvious reason, or seem to become unpaired without me doing anything.

And then there’s the stuttering, which might be related. I have yet to find a Bluetooth headset, speaker, or other audio receiver that doesn’t stutter for five minutes until it finds it groove. In fairness, after the initial five minutes, things tend to stay pretty stable (at least until I, say, move my phone five feet further from the speaker, at which point, they need to resync). But if it’s a matter of the two devices negotiating, I don’t know, frequencies and data throttling rates and protocols, why don’t they do it at the beginning? Or is it a TCP thing, where the two start out using little bandwidth and ramping up over time?

Lastly, there are the tunnel-vision implementations. From what I’ve seen, the Bluetooth standard defines roles that each device can play, e.g., “I can play audio”, “I can dial a phone number”, “I can display an address card”, “I can store files”, and so forth. But in practice, that doesn’t always work: my cell phone sees my desk phone as an earpiece, and earpieces can’t handle address cards, don’t be silly, so I can’t copy my cell phone’s contact list to my desk phone.

In the age of the Internet of Things, my desk phone can store contacts, my TV can run a browser, and pretty soon my toaster will be able to share its 5G hotspot with the neighborhood. There’s no reason to be limited by a noun on the box it came in.

I understand that most of the above is likely caused by bad implementation of a fundamentally decent protocol. But Bluetooth has been around for, what, a decade or more? And I still regularly run into these problems. That points to something systemic in the software community.

Artificial Lightning

A while back, I ran across this page by Daniel Kennett, explaining how to make an Arduino control a set of Ikea Dioder LED lights. Very cool stuff, and I was able to use his work to build something similar.

So with Halloween fast approaching, I thought it would be neat to use this to create lightning. Tape a set of LED lights in the window at the front of the house, and make it make lightning flashes.

Read More

Overfitting

One of the things I learned in math is that a polynomial of degree N can pass through N+1 arbitrary points. A straight line goes through any two points, a parabola goes through any three points, and so forth. The practical upshot of this is that if your equation is complex enough, you can fit it to any data set.

That’s basically what happened to the geocentric model: it started out simple, with planets going around the Earth in circles. Except that some of the planets wobbled a bit. So they added more terms to the equations to account for the wobbles. Then there turned out to be more wobbles on top of the first wobbles, and more terms had to be added to the equations to take those into account, and so on until the theory collapsed under its own weight. There wasn’t any physical mechanism or cause behind the epicycles (as these wobbles were called). They were just mathematical artifacts. And so, one could argue that the theory was simpler when it had fewer epicycles and didn’t explain all of the data, but also was less wrong.

Take another example (adapted from Russell Glasser, who got it from his CS instructor): let’s say you and I order a pizza, and it comes with olives. I hate olives and you love them, so we want to cut it up in such a way that we both get slices of the same size, but your slice has as many of the olives as possible, and mine have as few as possible. (And don’t tell me we could just order a half-olive pizza; I’m using this as another example.)

We could take a photo of the pizza, feed it into an algorithm that’ll find the position of each olive and come up with the best way to slice the pizza fairly, but with a maximum of olives on your slices.

The problem is, this tells us nothing about how to slice the next such pizza that we order. Unless there’s some reason to think that the olives on the next pizza will be laid out in some similar way on the next pizza, we can’t tell the pizza parlor how to slice it up when we place our next order.

In contrast, imagine if we’d looked at the pizza and said, “Hm. Looks like the cook is sloppy, and just tossed a handful of olives on the left side, without bothering to spread them around.” Then we could ask the parlor slice to slice it into wedges, and we have good odds of winding up with three slices with extra olives and three with minimal olives. Or if we’d found that the cook puts the olives in the middle and doesn’t spread them around. Then we could ask the parlor to slice the pizza into a grid; you take the middle pieces, and I’ll take the outside ones.

But our original super-optimal algorithm doesn’t allow us to do that: by trying to perfectly account for every single olive in that one pizza, it doesn’t help us at all in trying to predict the next pizza.

In The Signal and the Noise, Nate Silver calls this overfitting. It’s often tempting to overfit, because then you can say, “See! My theory of Economic Epicycles explains 29 of the last 30 recessions, as well as 85% of the changes in the Dow Jones Industrial Average!” But is this exciting new theory right? That is, does it help us figure out what the future holds; whether we’re looking at a slight economic dip, a recession, or a full-fledged depression?

We’ve probably all heard the one about how the Dow goes up and down along with skirt hems. Or that the performance of the Washington Redskins predicts the outcome of US presidential elections. Of course, there’s no reason to think that fashion designers control Wall Street, or that football players have special insight into politics. More importantly, it goes to show that if you dig long enough, you can find some data set that matches the one you’re looking at. And in this interconnected, online, googlable world, it’s easier than ever to find some data set that matches what you want to see.

These two examples are easy to see through, because there’s obviously no causal relationship between football and politics. But we humans are good at telling convincing stories. What if I told you that pizza sales (with or without olives) can help predict recessions? After all, when people have less spending money, they eat out less, and pizza sales suffer.

I just made this up, both the pizza example and the explanation. So it’s bogus, unless by some million-to-one chance I stumbled on something right. But it’s a lot more plausible than the skirt or football examples, and thus we need to be more careful before believing it.

Update: John Armstrong pointed out that the first paragraph should say “N+1”, not “N”.

Update 2: As if on cue, Wonkette helps demonstrate the problems with trying to explain too much in this post about Glenn Beck somehow managing to tie together John Kerry’s presence or absence on a boat, his wife’s seizure, and Hillary Clinton’s answering or not answering questions about Benghazi. Probably NSFW because hey, Wonkette. But also full of Glenn Beck-ey crazy.

Removing Missing Podcast Episodes from iTunes

So I figured something out today.

I add and remove podcasts in iTunes all the time. And every so often, iTunes loses track of, er, tracks. This usually shows up in a smart playlist, where a podcast episode exists in iTunes’s database, but there’s no corresponding MP3 file on disk. You can’t manually delete entries from smart playlists; and if I’ve deleted the podcast, then I can’t even go back to the podcast list to delete the bogus episode. Nor does it show up in the “Music” list, since it’s a podcast episode.

What I finally figured out is that if you change the Media type from “Podcast” to “Music”, that’ll move the episode to the Music list, where you can delete it.

You can change the media type with “Get info” (⌘-I) → Options → “Media Kind”.

Unfortunately, if the file is missing, iTunes won’t let you edit the media kind. So first you need to associate the podcast episode with an MP3 file. The easiest way is to copy an existing MP3 file to /tmp/foo.mp3 or some such. Then, when you ⌘-I and iTunes says no such file and “Do you want to locate it?”, say yes, and point it at /tmp/foo.mp3. Then you can edit the media kind, and delete the file from the Music list.

Death of the Desktop

I’m a geek. If you didn’t know this, it’s because you’ve never met me or talked to me for more than five minutes.

I keep reading that the desktop PC is dying: , even as tablet and smartphone sales are rising. One popular theory, then, is that people are doing on their tablets and phones what they used to do on their desktop PCs.

I hope this isn’t the case, because frankly, tablets and phones are crap when it comes to doing real work.

Don’t get me wrong: I’ve been using PDAs, and now a smartphone, for well over a decade. I also have an iPad that I use regularly. I also have a Swiss army knife, but while it’s a wonderful tool in a pinch, I’d rather have a real set of tools if they’re available.

The same goes for laptops, tablets, and phones: they’re portable, and that certainly counts for a lot. But size matters, too, and size is intrinsically non-portable.

I’m not terribly picky about keyboards: as long as it’s full-sized (i.e., the keys are roughly the size of my fingers), has arrow keys, function keys, a keypad, and reasonable action (in the piano sense of the word), I’m happy. I know people who swear by the industrial-style IBM keyboards, and while I don’t share their enthusiasm, I get it: not only are they nigh-indestructible, they also have decent springs and make a satisfying “click” noise when you type. When you’ve typed a key, you know it. It’s a small thing, but it makes a difference.

At home, I have a 20-inch monitor, and wouldn’t consider anything smaller. In fact, I wouldn’t mind adding a second one, the way I have at work, to be able to have more windows in front of me.

I see people who resize their browser or spreadsheet or whatever to the full size of the display, and I don’t get it. Half the screen seems ample, and would allow them to see what else is open at the same time. Even worse are people who have a full-screen browser with multiple tabs open. How can they see what’s going on in those other tabs, with the current one blocking their view?

I’m not terribly picky when it comes to mice, though I do prefer a mouse to a trackball or laptop-style trackpad (though I find myself tempted by Apple’s super-sized trackpad). It’s more a matter of dexterity and fine control than anything else. I’m not as good zeroing in on a small button with a trackpad that lies between my thumbs as I am with a mouse that has its own area to the side.

All of these things are relatively minor: they don’t stop me from doing work, they just make it a little easier, a little more pleasant. But then, what makes a workspace pleasant isn’t so much the things it does, as the things it doesn’t do: the annoyances that aren’t there so they don’t get in the way. Not having to look down at my fingers to make sure they’re on the home row. Not clicking on the wrong button by mistake.

But the other thing, the thing that keeps getting me awed reactions about how fast I work, is keybindings. I’ve taken the time to either learn or customize my environment—both the windowing environment and common applications—to be able to do common operations from the keyboard, without having to move my hand all the way to the mouse. Again, I see people who raise their hand, move it over to the mouse, click on the window they want to switch to, then put their hand back on the keyboard. It’s like watching someone with a 1970s-era TV set get up off the couch, turn the station knob on the set, and come back to the couch. You’d want to say “Why don’t you just use the remote?” just as I want to yell “Why don’t you just alt-tab through to the window you want?”

(Also, in both Firefox and Chrome, did you know that you can set up a URL with a keyword, that’ll fill in whatever you type after the keyword? If I want to look up Star Wars at IMDb, I don’t type in eye em dee bee dot com into the browser’s URL bar, then click on the search box and type in the name. I just type “imdb Star Wars” into the URL bar, and the browser replaces that with “http://www.imdb.com/find?s=all&q=Star%20Wars”. Try it with images.google.com, Wikipedia, or Bible Gateway and see how convenient it is.)

Yes, these things only take a few seconds each. But a few seconds here, a few seconds there, and it all eventually adds up to significant time.

So when I hear it suggested that people are abandoning desktop machines for portable ones, what I hear is that people are switching from dedicated workspaces where you can get stuff done comfortably, to something fundamentally inferior.

In principle, there’s no reason why a portable device running, say, Android, couldn’t be as flexible and configurable as a Linux/FreeBSD/Solaris box running X11/KDE/GNOME/what have you. But in practice, they’re not. Whether it’s a matter of limiting the configurability to simplify development, or the fact that Android apps are sandboxed and can’t talk to each other, or something else, I don’t know. But the fact is that right now, I couldn’t bind “shake the phone up and down” to mean “create a new appointment in the calendar” if I wanted to.

And then comes along something like Ubuntu’s Unity, which aims to be a common UI for both desktop and portable devices. Which is to say, it aims to strip down the desktop to allow only those things that are convenient on tablets.

That’s taking away configurability; it’s simplification that makes it harder to get work done, and that annoys me.

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.
—Doug Gwyn

In Defense of Gussie Fink-Nottle

For those who may have forgotten, Gussie Fink-Nottle is a character in the Jeeves stories by P.G. Wodehouse. He is the series’s stereotypical nerd: socially inept, a teetotaler, and physically unimpressive. His most memorable trait, however, is his fascination with newts.

Clearly Wodehouse tried to find the least interesting subject he could think of, to allow his character to easily bore all the other characters to tears by going on at length about his pathetic pet subject.

I don’t remember his early life ever being discussed in any detail, but I imagine that, as a weakling, he was never any good at sports and thus never developed an interest in them. Unable to hold his liquor, he never got into the habit of meeting with the chaps over drinks and experiencing the sorts of things that only seem to happen during alcohol-fueled debaucheries. His social ineptitude meant that he never became a lothario. Eventually, he was forced to become interested in that most uninteresting of subjects, newts.

But I would look at it from another angle: there is an infinite number of subjects in the world. What is it about newts that’s so interesting that Gussie would choose to devote his life to them?

Stephen Jay Gould, as I recall, did his graduate research on snails. Carl Sagan was interested in points of light in the sky. Bertrand Russell worked on breaking down existing mathematical proofs into longer chains of simpler steps. In each case, they found something interesting in what might appear to be an unintersting subject.

Likewise, I sometimes wonder what makes people want to go into professions like accounting or proctology. It can’t just be the money, can it? Presumably there’s something there that I don’t see, some hidden bit of beauty that I haven’t seen or had explained to me.

I don’t want to think, “Wow, what a loser, for being interested in something as boring as newts.” Rather, I want to ask, “What is it about newts that’s so interesting?”

XKCD: Beauty

Both Parties Lie, Right?

So I made some comment about the Republican convention being based on a lie or something, and my interlocutor made a comment about how, well, both parties lie. Well, sure. But the Republicans are worse than the Democrats. And she said no, they both lie about the same.

And thus, me being the type of person I am (and that type is “anal retentive”. Or “obsessive-compulsive”. Or something along those lines. Supply your own wild-ass psychoanalysis in the comments), I went looking for data.

FactCheck.org is good, but they have an annoying tendency to provide nuance and context, rather than just boiling a statement down to a single icon.

WaPo’s Fact Checker is better, with its Pinocchio-based truth scale, but when I checked, there wan’t a lot of easily-accessible data.

Which brings us to PolitiFact. They have both a cutesy-icon-based measurement, but also a lot of data. Although they allegedly have an API, I wasn’t able to find details on how to use it, so I just scraped a bunch of their web pages and grepped out the information I wanted.

And since you’ve been patiently waiting for, like, four or five paragraphs for a chart or something, here it is:

Comparison of Politifact rulings for major US parties. Each bar represents the percentage of statements by that party that fall into a given category.

The data I used is here. There are separate sheets for Democrats and Republicans, with a count of how many statements each person or organization has made in each truth bucket (BTW, in case the phrase “truth bucket” becomes useful during this or any other campaign season, remember that you read it here first).

The first thing that jumps out is that, well, Republicans have fewer “True” and “Mostly True” statements than Democrats, and more “Mostly False”, “False”, and “Pants on Fire”. Which is kind of what I figured anyway, but it’s nice to see my opinion confirmed in chart form.

Anyway, often a person’s or organization’s page has a field that gives their political affiliation, e.g., Barack Obama is listed as “Democrat from Illinois”, while Concerned Taxpayers of America is listed as “Republican from Oregon”. I took the people and organizations listed as “Democrat from” or “Republican from” wherever, and discarded the rest.

Then it was just a matter of spreadsheetizing the data, and totting up the total number of statements by each party, counting up how many statements fall into each category, and, of course, endless fiddling about with fonts and column layouts.

The result is as objective as I could make it. You could argue that PolitiFact is biased for or against the party of your choice, but if there’s bias in the above, I don’t want to come from me.