Archives August 2020

Recursive grepping

Sometimes you just need to grep recursively in a directory. If you’re using find $somedir -type f -exec grep $somestring {} \;, don’t:

Use xargs to avoid creating a bazillion grep processes:
find $somedir -type f -print | xargs grep $somestring

But spaces in the output of find (i.e., spaces in filenames) will confuse xargs, so use -print0 and xargs -0 instead:
find $somedir -type f -print0 | xargs -0 grep $somestring

Except that you can achieve the same effect with find, with \+:
find $somedir -type f -exec grep $somestring {} \+

Or you can just use recursive grep:
grep -r $somestring $somedir
except that find allows you to filter by file type, name, age, etc.

Campaign Manager: Serializing Recursive Objects

There’s an idea for a game that’s been rattling around my brain for a while, so I finally decided to learn enough Unity to birth it.

To get an idea of what I have in mind, think Sid Meier’s Civilization, but for elections: you play a campaign manager, and your job is to get your candidate elected. This is complicated by the fact that the candidate may be prone to gaffes (think Joe Biden) or unmanageable (think Donald Trump); union leaders, state governors, CEOs, etc. can all promise their support, but they’ll all want something in return.

The thing I learned this week is the Unity JSON Serializer. It’s used to store prefabs and other data structures in the editor, so I figured it would be the natural tool for storing canned scenarios, as well as for saving games. It has a few quirks, though: for one thing, it can only serialize basic types (int, string, and the like), and a few constructs like struct, and arrays and List<T> of basic types. If you’re used to the introspection of languages like Python and Perl, you’re likely to be disappointed in C#. For another, in order to avoid runaway recursion, it only serializes to a depth of 7.

To start with, I defined class Area to represent an area such as a country, like the US. An Area is divided in various ways: politically, it’s divided into states. Culturally, it’s divided into regions like the Pacific Northwest or the Bible Belt. Other divisions are possible, depending on what you’re interested in. So I added a Division class:

namespace Map {
    [Serializable]
    public class Area
    {
	public string id;
	public string name;
	public List<Division> divisions = new List<Division>();
    }

    public class Division
    {
	public string id;
	public string name;
	public List<Area> children;
    }
}

As you can see, this is recursive: an Area can have multiple Divisions, each of which can contain other Areas with their own Divisions. This allows us to divide the United States into states, which are in turn divided into Congressional Districts, which in turn are divided into precincts.

Since neither of our classes are elementary types, they can’t be serialized directly. So let’s add struct SerializableArea and struct SerializableDivision to hold the structures that will actually be stored on disk, as opposed to Area and Division which will be used in memory at run time, and use the ISerializationCallbackReceiver interface that will give our classes a hook called when the object is serialized or deserialized.

Without wishing to repeat what’s in the docs: in order to get around the various limitations of the Serializer, the way to serialize a tree of objects is to serialize an array of objects, and use identifiers to refer to particular objects. Let’s say our in-memory Area tree looks something like:

  • Area United States
    • Division Political
      • Area Alabama
      • Area Alaska
    • Division Regions
      • Area Bible Belt
      • Area Pacific Northwest

(That’s just an outline, of course. Each node has more members than are shown here.) We can serialize this as two arrays: one with all of the Areas, and one with all of the Divisions:

  • Area United States
    • List<SerializableArea> childAreas:
      • SerializableArea Alabama
      • SerializableArea Alaska
      • SerializableArea Bible Belt
      • SerializableArea Pacific Northwest
    • List<SerializableDivision> serialDivisions:
      • SerializableDivision Political
      • SerializableDivision Regions

We don’t want to recurse, but we do want to be able to rebuild the tree structure when we deserialize the above. So SerializableArea contains, not a list of Divisions, but a list of identifiers that we can find in serialDivisions. Likewise, SerialDivision contains not a list of Areas, but a list of identifiers that we can look up in childAreas.

Naturally, Area and Division each contain a Serialize() method that recursively serializes it and its children.

The next question is: so you’ve been asked to serialize an Area. How do you know whether you’re supposed to add it to an existing childAreas list or start your own?

Answer: if the serialization was invoked through OnBeforeSerialize(), then you’re serializing the top level object and should allocate a list to put the children in. Otherwise, append to an existing list, which should be passed in as a parameter to Serialize().

If anyone’s interested in what it looks like when all is said and done, here it is:

namespace Map {

    [Serializable]
    public class Area : ISerializationCallbackReceiver
    {
	public string id;
	public string name;

	public List<Division> divisions = new List<Division>();

	[Serializable]
	public struct SerializableArea
	{
	    public string id;
	    public string name;
	    public List<string> divisions;
	}

	public List<Division.SerializableDivision> serialDivisions;
	public List<SerializableArea> childAreas = new List<SerializableArea>();

	public void OnBeforeSerialize()
	{
	    serialDivisions =
		new List<Division.SerializableDivision>(divisions.Count);
	    childAreas = new List<SerializableArea>();

	    for (int i = 0; i < divisions.Count; i++)
		divisions[i].Serialize(ref serialDivisions, ref childAreas);
	}

	// Look up a Division by name, so we can avoid adding it twice
	// to serialDivisions.
	private int FindDivisionById(string id,
				     ref List<Division.SerializableDivision>dlist)
	{
	    for (int i = 0; i < dlist.Count; i++)
		if (dlist[i].id == id)
		    return i;
	    return -1;
	}

	public void Serialize(ref List<SerializableArea> alist,
			      ref List<Division.SerializableDivision> dlist)
	{
	    SerializableArea sa = new SerializableArea();
	    sa.id = id;
	    sa.name = name;
	    sa.divisions = new List<string>(divisions.Count);

	    alist.Add(sa);

	    for (int i = 0; i < divisions.Count; i++)
	    {
		sa.divisions.Add(divisions[i].name);

		int d = FindDivisionById(divisions[i].id, ref dlist);
		if (d < 0)
		    // Don't add a Division to dlist twice.
		    divisions[i].Serialize(ref dlist, ref alist);
	    }
	}
    }

    public class Division : ISerializationCallbackReceiver
    {
	public string id;
	public string name;
	public List<Area> children;

	[Serializable]
	public struct SerializableDivision
	{
	    public string id;
	    public string name;
	    public List<string> areas;
	}

	public void Serialize(ref List<SerializableDivision> dlist,
			      ref List<Area.SerializableArea> alist)
	{
	    SerializableDivision sd = new SerializableDivision();
	    sd.id = id;
	    sd.name = name;
	    sd.areas = new List<string>(children.Count);

	    dlist.Add(sd);

	    for (int i = 0; i < children.Count; i++)
	    {
		sd.areas.Add(children[i].name);

		int a = FindAreaById(children[i].id, ref alist);
		if (a < 0)
		    // Don't add an Area to alist twice.
		    children[i].Serialize(ref alist, ref dlist);
	    }
	}

	private int FindAreaById(string id,
				 ref List<Area.SerializableArea> alist)
	{
	    for (int i = 0; i < alist.Count; i++)
		if (alist[i].id == id)
		    return i;
	    return -1;
	}
    }
}