There Are Days When I Hate XML

…and days when I really hate XML.

In this case, I have an XML document like

<?xml version="1.0" ?>
<foo xmlns="http://some.org/foo">
  <thing id="first"/>
  <thing id="second"/>
  <thing id="third"/>
</foo>

and I want to get things out of it with XPath.

But I couldn’t manage to select things exactly, such as the <foo> element at the top. You’d think “/foo” would do it, but it didn’t.

Eventually I found out that the problem is the xmlns=”...” attribute. It looks perfectly normal, saying that if you have <thing> without a prefix (like “<ns:thing>”), then it’s in the “http://some.org/foo” namespace.

However, in XPath, if you specify “ns:thing”, it means “a thing element in whichever namespace the ns prefix corresponds to”. BUT “thing” means “a thing element that’s not in a namespace”.

So how do you specify an element that’s the empty-string namespace, as above? The obvious way would be to select “:thing”, but that doesn’t work. Too simple, I suppose. Maybe that gets confused with CSS pseudo-selectors or something.

No, apparently the thing you need to do is to invent a prefix for the standard elements of the file you’re parsing. That is, add “ns” as another prefix that maps onto “http://some.org/foo” and then select “ns:thing”. There are different ways of doing this, depending which library you’re using to make XPath queries, but still, it seems like a giant pain in the ass.

Evil Hack of the Day

MacOS plist XML files are evil; even more so than regular XML. For instance, my iTunes library file consists mostly of entries like:

<key>5436</key>
<dict>
	<key>Track ID</key><integer>5436</integer>
	<key>Name</key><string>Getting Better</string>
	<key>Artist</key><string>The Beatles</string>
	<key>Composer</key><string>Paul McCartney/John Lennon</string>
	<key>Album</key><string>Sgt. Pepper's Lonely Hearts Club Band</string>
	…
</dict>

You’ll notice that there’s no connection between a key and its value, other than proximity. There’s no real indication that these are fields in a data record, and unlike most XML files, you have to consider the position of each element compared to its neighbors. It’s almost as if someone took a file of the form

Track_ID = 5436
Name = "Getting Better"
Artist = "The Beatles"
Coposer = "Paul McCartney/John Lennon"

and, when told to convert it to XML in the name of buzzword-compliance, did a simple and quarter-assed search and replace.

But of course, what was fucked up by (lossless) text substitution can be unfucked by text substitution. And what’s the buzzword-compliant tool for doing text substitution on XML, even crappy XML? XSLT, of course. The template language that combines the power of sed with the terseness of COBOL.

So I hacked up an XSLT template to convert my iTunes library into a file that can be required in a Perl script. Feel free to use it in good or ill health. If you spring it on unsuspecting developers, please send me a photo of their reaction.