Geek, Epsilon Clue, Page 2

Geek Miscellaneous

Staying on Top of the Wave

I used to work for a computer researcher who didn’t actually use computers. Academics can sometimes be quirky, you know? He just had ~/.forward set up to print every message (on paper) as it came in. One problem with this setup was that people would send him résumés as PostScript (and later PDF) attachments. The printing system would print these out as plain text and use up all of the paper in the printer if someone didn’t catch it.

This raised a question for someone:

Why would PDFs print as plain text and how could you set your printer up to do that?
I’m so confused by that last sentence and tried readingbitba different way to make sense of it. I still don’t get what you mean.

— ?? jean farouche????? (@JFarouche19) March 10, 2021

and I realized that the nightmare that printing under Unix is rather less nightmarish these days. The ubiquity of PDF means that I don’t really have to worry about converting files to the proper device-dependent format. Line printers are gone, so I don’t have to worry about the difference between printing text and printing graphics. Unicode means I don’t have to worry about whether a document is in English, French, or Russian.

Of course, all of this convenience comes at a cost. For instance, Unicode is a complex soup of characters, codepoints, encodings, and normal forms. When I was learning C, way back when, in between mammoth hunts and rubbing sticks together to make fire, everything was US English, in ASCII, and a letter was a character was a byte was an octet.

This may have been limited, but at least it made things easy to learn. Which made me wonder about people wading into the field these days: how do they learn, when even a simple “Hello world” program has all this complexity behind it?

There are two answers to that: in some cases, say Unity or Electron, tutorials will typically give you a bunch of standard scaffolding — headers, includes, starter classes — and say “Don’t worry about all that for now; we’ll cover it later.” In other cases, say, Python, a lot of the complexity is optional enough that it doesn’t need to be shown to the beginner.

New people are coming in at the top of this graph. I came in a bit further down, but my entry point was already on top of a bunch of existing work by other people. After all, I got to use (complicated) operating systems instead of running code on bare metal, text editors rather than toggling my code in on the front panel, and much besides. People starting out today are wading in in the middle, just like I did, just at a different point.

And this is what we humans do. When Isaac Newton talked about standing on the shoulders of giants, this is what he was talking about. We build tools using existing tools, and then use those new tools to build newer tools, changing the world as we go.

Of course, new technology can also disrupt and undermine the old way of doing things, and there’s a persistent fear of AI taking away jobs that used to be done by humans, just as automation has. Just as dockworkers now use forklifts to load cargo rather than carry it themselves, companies now use chatbots to handle the front line of customer support. Machines have largely supplanted travel agents, and have even started writing press releases. What’s a poor meat puppet to do?

I keep thinking of how longshoremen became forklift operators: when new technology comes along, we still need to control it, steer it, decide what it ought to do. I used to install Unix on individual computers from CD, one at a time. These days, I use tools like Terraform and Puppet to orchestrate dozens or hundreds of machines. And I think that’s where the challenge is going to be in the future: staying on top of new technology and deciding what to do with it.

Andrew Arensburger

Mar, Thu, 2021

Geek

What the Hell Is Up With MacOS Periodic Jobs?

Yesterday, Feb. 25 at 13:19, my Mac ran periodic monthly. I thought it odd that this would run on the 25th rather than the 1st of the month, and in the afternoon rather than late at night, so I dug a little deeper.

It looks as though Apple now deprecates cron in favor of launchd agents. Okay, fine. Yeah, I see that there are three periodic jobs:

> launchctl list | grep periodic-
-	-9	com.apple.periodic-monthly
13804	0	com.apple.periodic-weekly
-	-9	com.apple.periodic-daily

launchctl print system/com.apple.periodic-monthly shows details about the monthly job:

event triggers = {
        com.apple.periodic-monthly => {
                keepalive = 0
                service = com.apple.periodic-monthly
                ...
                descriptor = {
                        "Repeating" => true
                        "GracePeriod" => 14400
                        "Interval" => 2629746
                        ...

Nowhere does it say to run this job at a particular time on a particular day. Rather, as far as I can tell, it wants to run every 2629746 seconds. What’s that?

launchctl print system/com.apple.periodic-daily shows

"Interval" => 86400

which makes perfect sense, since 86400 is 60 × 60 × 24, the number of seconds in a day. And com.apple.periodic-weekly runs every 60 × 60 × 24 × 7 = 604800 seconds.

But 2629746, the monthly interval, is 43829.1 minutes = 730.485 hours = 30.4368 days. What the hell is that?

I note that the average month length in a year is 365/12 = 30.4166 days, which is close. And what with leap years, the average year is really 365.25 days long; 365.25/12 = 30.4375, which is close to the periodic-monthly interval. So maybe Apple took the length of the average century, when all leap days, century non-leap days, and millennial non-non-leap days are taken into account, and divided by the number of months in a century? Or something else? Who knows?

Maybe Apple has decided that it doesn’t matter when accounting and log-rotation jobs run, and that it’s okay for them to drift. Fine. Except that the simplest and easiest documented way of adding your own periodic jobs is to add a file to /etc/periodic/daily, /etc/periodic/weekly, or /etc/periodic/monthly. And people might want resource-intensive cleanup jobs to run in the middle of the night, and for reports to run on the first of the month. So what the hell, Apple?

Andrew Arensburger

Feb, Fri, 2021

Geek Things I've Learned

Ansible: Roles, Role Dependencies, and Variables

I just spent some time banging my head against Ansible, and thought I’d share in case anyone else runs across it:

I have a Firefox role that allows you to define a Firefox profile with various plugins, config settings, and the like. And I have a work-from-home (WFH) role that, among other things, sets up a couple of work profiles in Firefox, with certain proxy settings and plugins. I did this the way the documentation says to:

dependencies:
  - name: Profile Work 1
    role: firefox
    vars:
      - profile: work1
        addons:
          - ublock-origin
          - privacy-badger17
          - solarize-fox
        prefs: >-
          network.proxy.http: '"proxy.host.com"'
          network.proxy.http_port: 1234
  - name: Profile Work 2
    role: firefox
    vars:
      - profile: work2
        addons:
          - ublock-origin
          - privacy-badger17
          - solarized-light
        prefs: >-
          network.proxy.http: '"proxy.host.com"'
          network.proxy.http_port: 1234

The WFH stuff worked fine at first, but then I added a new profile.

- name: Roles
  hosts: my-host
  roles:
    - role: wfh
    - role: firefox
      profile: third
      addons:
        - bitwarden-password-manager
        - some fancy-theme

This one didn’t have any prefs, but Ansible was applying the prefs from the WFH role.

Eventually, I found that the problem lay in the two vars blocks in the wfh role’s dependencies: apparently those get set as variables for the entire task or play, not just for that invocation of the firefox role. The solution turned out to be undocumented: drop the vars blocks and pull the role parameters up a level:

dependencies:
  - name: Profile Work 1
    role: firefox
    profile: work1
    addons:
      - ublock-origin
      - privacy-badger17
      - solarize-fox
    prefs: >-
      network.proxy.http: '"proxy.host.com"'
      network.proxy.http_port: 1234
  - name: Profile Work 2
    role: firefox
    profile: work2
    addons:
      - ublock-origin
      - privacy-badger17
      - solarized-light
    prefs: >-
      network.proxy.http: '"proxy.host.com"'
      network.proxy.http_port: 1234

I do like Ansible, but it’s full of fiddly stupid crap like this.

Andrew Arensburger

Sep, Fri, 2020

Geek

Recursive grepping

Sometimes you just need to grep recursively in a directory. If you’re using find $somedir -type f -exec grep $somestring {} \;, don’t:

Use xargs to avoid creating a bazillion grep processes:
find $somedir -type f -print | xargs grep $somestring

But spaces in the output of find (i.e., spaces in filenames) will confuse xargs, so use -print0 and xargs -0 instead:
find $somedir -type f -print0 | xargs -0 grep $somestring

Except that you can achieve the same effect with find, with \+:
find $somedir -type f -exec grep $somestring {} \+

Or you can just use recursive grep:
grep -r $somestring $somedir
except that find allows you to filter by file type, name, age, etc.

Andrew Arensburger

Aug, Wed, 2020

Hacking Uncategorized

Campaign Manager: Serializing Recursive Objects

There’s an idea for a game that’s been rattling around my brain for a while, so I finally decided to learn enough Unity to birth it.

To get an idea of what I have in mind, think Sid Meier’s Civilization, but for elections: you play a campaign manager, and your job is to get your candidate elected. This is complicated by the fact that the candidate may be prone to gaffes (think Joe Biden) or unmanageable (think Donald Trump); union leaders, state governors, CEOs, etc. can all promise their support, but they’ll all want something in return.

The thing I learned this week is the Unity JSON Serializer. It’s used to store prefabs and other data structures in the editor, so I figured it would be the natural tool for storing canned scenarios, as well as for saving games. It has a few quirks, though: for one thing, it can only serialize basic types (int, string, and the like), and a few constructs like struct, and arrays and List<T> of basic types. If you’re used to the introspection of languages like Python and Perl, you’re likely to be disappointed in C#. For another, in order to avoid runaway recursion, it only serializes to a depth of 7.

To start with, I defined class Area to represent an area such as a country, like the US. An Area is divided in various ways: politically, it’s divided into states. Culturally, it’s divided into regions like the Pacific Northwest or the Bible Belt. Other divisions are possible, depending on what you’re interested in. So I added a Division class:

namespace Map {
    [Serializable]
    public class Area
    {
	public string id;
	public string name;
	public List<Division> divisions = new List<Division>();
    }

    public class Division
    {
	public string id;
	public string name;
	public List<Area> children;
    }
}

As you can see, this is recursive: an Area can have multiple Divisions, each of which can contain other Areas with their own Divisions. This allows us to divide the United States into states, which are in turn divided into Congressional Districts, which in turn are divided into precincts.

Since neither of our classes are elementary types, they can’t be serialized directly. So let’s add struct SerializableArea and struct SerializableDivision to hold the structures that will actually be stored on disk, as opposed to Area and Division which will be used in memory at run time, and use the ISerializationCallbackReceiver interface that will give our classes a hook called when the object is serialized or deserialized.

Without wishing to repeat what’s in the docs: in order to get around the various limitations of the Serializer, the way to serialize a tree of objects is to serialize an array of objects, and use identifiers to refer to particular objects. Let’s say our in-memory Area tree looks something like:

Area United States
- Division Political
  - Area Alabama
  - Area Alaska
  - …
- Division Regions
  - Area Bible Belt
  - Area Pacific Northwest
  - …

(That’s just an outline, of course. Each node has more members than are shown here.) We can serialize this as two arrays: one with all of the Areas, and one with all of the Divisions:

Area United States
- List<SerializableArea> childAreas:
  - SerializableArea Alabama
  - SerializableArea Alaska
  - SerializableArea Bible Belt
  - SerializableArea Pacific Northwest
- List<SerializableDivision> serialDivisions:
  - SerializableDivision Political
  - SerializableDivision Regions

We don’t want to recurse, but we do want to be able to rebuild the tree structure when we deserialize the above. So SerializableArea contains, not a list of Divisions, but a list of identifiers that we can find in serialDivisions. Likewise, SerialDivision contains not a list of Areas, but a list of identifiers that we can look up in childAreas.

Naturally, Area and Division each contain a Serialize() method that recursively serializes it and its children.

The next question is: so you’ve been asked to serialize an Area. How do you know whether you’re supposed to add it to an existing childAreas list or start your own?

Answer: if the serialization was invoked through OnBeforeSerialize(), then you’re serializing the top level object and should allocate a list to put the children in. Otherwise, append to an existing list, which should be passed in as a parameter to Serialize().

If anyone’s interested in what it looks like when all is said and done, here it is:

namespace Map {

    [Serializable]
    public class Area : ISerializationCallbackReceiver
    {
	public string id;
	public string name;

	public List<Division> divisions = new List<Division>();

	[Serializable]
	public struct SerializableArea
	{
	    public string id;
	    public string name;
	    public List<string> divisions;
	}

	public List<Division.SerializableDivision> serialDivisions;
	public List<SerializableArea> childAreas = new List<SerializableArea>();

	public void OnBeforeSerialize()
	{
	    serialDivisions =
		new List<Division.SerializableDivision>(divisions.Count);
	    childAreas = new List<SerializableArea>();

	    for (int i = 0; i < divisions.Count; i++)
		divisions[i].Serialize(ref serialDivisions, ref childAreas);
	}

	// Look up a Division by name, so we can avoid adding it twice
	// to serialDivisions.
	private int FindDivisionById(string id,
				     ref List<Division.SerializableDivision>dlist)
	{
	    for (int i = 0; i < dlist.Count; i++)
		if (dlist[i].id == id)
		    return i;
	    return -1;
	}

	public void Serialize(ref List<SerializableArea> alist,
			      ref List<Division.SerializableDivision> dlist)
	{
	    SerializableArea sa = new SerializableArea();
	    sa.id = id;
	    sa.name = name;
	    sa.divisions = new List<string>(divisions.Count);

	    alist.Add(sa);

	    for (int i = 0; i < divisions.Count; i++)
	    {
		sa.divisions.Add(divisions[i].name);

		int d = FindDivisionById(divisions[i].id, ref dlist);
		if (d < 0)
		    // Don't add a Division to dlist twice.
		    divisions[i].Serialize(ref dlist, ref alist);
	    }
	}
    }

    public class Division : ISerializationCallbackReceiver
    {
	public string id;
	public string name;
	public List<Area> children;

	[Serializable]
	public struct SerializableDivision
	{
	    public string id;
	    public string name;
	    public List<string> areas;
	}

	public void Serialize(ref List<SerializableDivision> dlist,
			      ref List<Area.SerializableArea> alist)
	{
	    SerializableDivision sd = new SerializableDivision();
	    sd.id = id;
	    sd.name = name;
	    sd.areas = new List<string>(children.Count);

	    dlist.Add(sd);

	    for (int i = 0; i < children.Count; i++)
	    {
		sd.areas.Add(children[i].name);

		int a = FindAreaById(children[i].id, ref alist);
		if (a < 0)
		    // Don't add an Area to alist twice.
		    children[i].Serialize(ref alist, ref dlist);
	    }
	}

	private int FindAreaById(string id,
				 ref List<Area.SerializableArea> alist)
	{
	    for (int i = 0; i < alist.Count; i++)
		if (alist[i].id == id)
		    return i;
	    return -1;
	}
    }
}

Andrew Arensburger

Aug, Tue, 2020

emacs

WFHing with Emacs: Work Mode and Command-Line Options

Like the rest of the world, I’m working from home these days. One of the changes I’ve made has been to set up Emacs to work from home.

I use Emacs extensively both at home and at work. So far, my method for keeping personal stuff and work stuff separate has been to, well, keep separate copies of ~/.emacs.d/ on work and home machines. But now that my home machine is my work machine, I figured I’d combine their configs.

To do this, I just added a -work command-line option, so that emacs -work runs in work mode. The command-switch-alist variable is useful here: it allows you to define a command-line option, and a function to call when it is encountered:

(defun my-work-setup (arg)
   ;; Do setup for work-mode
  )
(add-to-list 'command-switch-alist
  '("work" . my-work-setup))

Of course, I’ve never liked defining functions to be called only once. That’s what lambda expressions are for:

(add-to-list 'command-switch-alist
  '("work" .
    (lambda (arg)
      ;; Do setup for work-mode
      (setq my-mode 'work)
      )))

One thing to bear in mind about command-switch-alist is that it gets called as soon as the command-line option is seen. So let’s say you have a -work argument and a -logging option. And the -logging-related code needs to know whether work mode is turned on. That means you would always have to remember to put the -work option before the -logging option, which isn’t very elegant.

A better approach is to use the command-switch-alist entries to just record that a certain option has been set. The sample code above simply sets my-mode to 'work when the -work option is set. Then do the real startup stuff after the command line has been parsed and you know all of the options that have been passed in.

Unsurprisingly, Emacs has a place to put that: emacs-startup-hook:

(defvar my-mode 'home
  "Current mode. Either home or work.")
(add-to-list 'command-switch-alist
  '("work" . (lambda (arg)
                (setq my-mode 'work))))

(defvar logging-p nil
  "True if and only if we want extra logging.")
(add-to-list 'command-switch-alist
  '("logging" . (lambda (arg)
                  (setq logging-p t))))

(add-hook 'emacs-startup-hook
  (lambda nil
    (if (eq my-mode 'work)
      (message "Work mode is turned on."))
    (if logging-p
      (message "Extra logging is turned on."))
    (if (and (eq my-mode 'work)
             logging-p)
      (message "Work mode and logging are both turned on."))))

Check the *Messages* buffer to see the output.

Andrew Arensburger

Apr, Thu, 2020

Hacking

Ansible As Scripting Language

Ansible is billed as a configuration manager similar to Puppet or cfengine. But it occurred to me recently that it’s really (at least) two things:

A configuration manager.
A scripting language for the machine room.

Mode 1 is the normal, expected one: here’s a description; now make the machine(s) look like the description. Same as Puppet.

Mode 2 is, I think, far more difficult to achieve in Puppet than it is in Ansible. This is where you make things happen in a particular order, not just on one machine (you’d use /bin/sh for that), but on multiple hosts.

For instance, adding a new user might involve:

Generate a random password on localhost.
Add a user on the Active Directory server.
Create and populate a home directory on the home directory server.
Add a stub web page on the web server.

This is something I’d rather write as an ansible play, than as a Puppet manifest or module.

Which brings me to my next point: it seems that for Mode 1, you want to think mainly in terms of roles, while for Mode 2, you’ll want to focus on playbooks. A role is designed to encapsulate the notion of “here’s a description of what a machine should look like, and the steps to take, if any, to make it match that description”, while a playbook is naturally organized as “step 1; step 2; step 3; …”.

These are, of course, just guidelines. And Mode 2 suffers from the fact that YAML is not a good way to express programming concepts. But I find this to be a useful way of thinking about what I’m doing in Ansible.

Andrew Arensburger

Jul, Fri, 2019

Geek Things I've Learned

Ansible: Running Commands in Dry-Run Mode in Check Mode

Say you have an Ansible playbook that invokes a command. Normally, that command executes when you run ansible normally, and doesn’t execute at all when you run ansible in check mode.

But a lot of commands, like rsync have a -n or --dry-run argument that shows what would be done, without actually making any changes. So it would be nice to combine the two.

Let’s start with a simple playbook that copies some files with rsync:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        -avi
        /tmp/source/
        /tmp/destination/
  hosts: localhost
  become: no
  gather_facts: no

When you execute this playboook with ansible-playbook foo.yml rsync runs, and when you run in check mode, with ansible-playbook -C foo.yml, rsync doesn’t run.

This is inconvenient, because we’d like to see what rsync would have done before we commit to doing it. So let’s force it to run even in check mode, with check_mode: no, but also run rsync in dry-run mode, so we don’t make changes while we’re still debugging the playbook:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        --dry-run
        -avi
        /tmp/source/
        /tmp/destination/
      check_mode: no
  hosts: localhost
  become: no
  gather_facts: no

Now we just need to remember to remove the --dry-run argument when we’re ready to run it for real. And turn it back on again when we need to debug the playbook.

Or we could do the smart thing, and try to add that argument only when we’re running Ansible in check mode. Thankfully, there’s a variable for that: ansible_check_mode, so we can set the argument dynamically:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        {{ '--dry-run' if ansible_check_mode else '' }}
        -avi
        /tmp/source/
        /tmp/destination/
      check_mode: no
  hosts: localhost
  become: no
  gather_facts: no

You can check that this works with ansible-playbook -v -C foo.yml and ansible-playbook -v foo.yml.

Andrew Arensburger

Sep, Tue, 2018

Hacking

Pseudo-Numeric Identifiers

Let’s say you’re a programmer, and your application uses Library of Congress Control Numbers for books, e.g., 2001012345, or ZIP codes, like 90210. What data types would you use to represent them? Or maybe something like the Dewey Decimal System, which uses 320 to classify a book as Political Science, 320.5 for Political Theory, and 320.973 for “Political institutions and public administration (United States)”?

If you said “integer”, “floating point”, or any kind of numeric type, then clearly you weren’t paying attention during the title.

The correct answer was “string” (or some kind of array of tokens), because although these entities consist of digits, they’re not numbers: they’re identifiers, same as “root” or “Jane Smith”. You can assign them, sort them, group them by common features, but you can’t meaningfully add or multiply them together. If you’re old enough, you may remember the TV series The Prisoner or Get Smart, where characters, most of them secret agents, refer to each other by their code numbers all the time; when agents 86 and 99 team up, they don’t become agent 185 all of a sudden.

If you keep in mind this distinction between numbers, which represent quantities, and strings that merely look like numbers because they happen to consist entirely of integers, you can save yourself a lot of grief. For instance, when your manager decides to store the phone number 18003569377 as “1-800-FLOWERS”, dashes and all. Or when you need to store a foreign phone number and have to put a plus sign in front of the country code.

Andrew Arensburger

Aug, Wed, 2018

Geek

Removing Magic

So this was one of those real-life mysteries.

I like crossword puzzles. And in particular, I like indie crossword puzzles, because they tend to be more inventive and less censored than ones that run in newspapers. So I follow several crossword designers on Twitter.

Yesterday, one of them mentioned that people were having a problem with his latest puzzle. I tried downloading it on my iPad, and yeah, it wouldn’t open in Across Lite. Other people were saying that their computers thought the file was in PostScript format. I dumped the HTTP header with

lynx -head -dump http://url.to/crossword.puz

and found the header

Content-type: application/postscript

which was definitely wrong for a .puz file. What’s more, other .puz files in the same directory were showing up as

Content-type: application/octet-stream

as they should.

I mentioned all this to the designer, which led to us chatting back and forth to see what the problem was. And eventually I had the proverbial aha moment.

.puz files begin with a two-byte checksum. In this particular case, they turned out to be 0x25 and 0x21. Or, in ASCII, “%!“. And as it turns out, PostScript files begin with “%!“, according to Unix’s magic file.

So evidently what happened was: the hosting server didn’t have a default type for files ending in .puz. Not terribly surprising, since that’s not really a widely-used format. So since it didn’t recognize the filename extension, it did the next-best thing and looked at the first few bytes of the file (probably with file or something equivalent) to see if it could make an educated guess. It saw the checksum as “%!” and decided it was a PostScript file.

The obvious fix was to change something about the file: rewrite a clue, add a note, change the copyright statement, anything to change the contents of the file, and thus the checksum.

The more permanent solution was to add a .htaccess file to the puzzle file directory, with

AddType application/octet-stream .puz

assuming that the hosting provider used Apache or something compatible.

This didn’t take immediately; I think the provider cached this metadata for a few hours. But eventually things cleared up.

I’m not sure what the lesson is, here. “Don’t use two-byte checksums at offset 0”, maybe?

Andrew Arensburger

Aug, Sat, 2018

Epsilon Clue

Epsilon Clue

Category Geek