Little Languages and Tables
Recently, a coworker whipped up a Perl script that’ll build all of the
Perl modules we support. This is useful for when we add a new
supported OS or OS version. This script takes a config file, moduledefs, which lists the modules to build, as well as various quirks that affect how and whether the modules should be built. moduledefs is itself a `require‘d Perl script:
# hash of module names (as known to perl) and parameters. # value is an array of parameters, as follows: # index 0: build directory. If no build directory is given, # we assume it is the same as the module name, # changing :: to - # index 1: don't make test. This field is a regex of AFS # sysnames not to test on. If this is not set, # we make test everywhere. If it is set, # and sysname matches, we don't make test # index 2: regex of AFS sysnames not to build on. If this is # not set, we build everywhere. If it is set, # and sysname matches, we don't build # %MODINFO = ( "ARS" => [ "ARSperl", ".*", "alpha_dux|(amd64|i386)_rel30" ], "Authen::Krb4" => [ "Krb4" ], "CGI" => [ "" ], "Compress::Zlib" => [ "", "alpha_dux40" ], "Convert::ASN1" => [ "" ], "Convert::BER" => [ "" ], "Crypt::CBC" => [ "" ], "Crypt::DES" => [ "" ], "Crypt::IDEA" => [ "" ],…
# array of module names (as known to perl) in the order they must be # built in. @MODULES = ( "ARS", "Authen::Krb4", "CGI", "Convert::ASN1", "Convert::BER", "Crypt::CBC", "Crypt::DES", "Crypt::IDEA", "Crypt::SSLeay", "DBI", # DBI needs to be before the DBD modules "DBD::ODBC", "DBD::Oracle", "DBD::Pg", "DB_File",…
Don’t roll your eyes too much, because this is actually fairly sensible for our environment. But there’s a lot more punctuation than is necessary. The same effect could be achieved more compactly. The @MODULES list could be built with qw, e.g.:
@MODULES = qw( ARS Authen::Krb4 CGI )…
but that wouldn’t allow us to have comments in the list, and comments could be useful. So instead, let’s read the list from a data file.
__DATA__
Now, all of the code above is already in an auxiliary file separate from the main script (the one whose job is to build and test the modules), so it would be inelegant to further pollute the directory with extra cruft. Fortunately, Perl has the magic token __DATA__, which means “this is the end of the Perl script, and the beginning of the special data section.” The data section can then be read via the special DATA filehandle. So we can write:
while (<DATA>) { next if /^#/; # Ignore comments chomp; push @MODULES, $_; # Add the module name to the list } __DATA__ ARS Authen::Krb4 CGI Convert::ASN1 Convert::BER Crypt::CBC Crypt::DES Crypt::IDEA Crypt::SSLeay # DBI needs to be before the DBD modules DBI DBD::ODBC…
Reading Tables
Then there’s the %MODINFO hash, which contains information on how to build, whether to build, and whether to test on a given architecture. One drawback so far is that the information about a given module is in two separate places. Since the values in %MODINFO are arrays, we could just put these values in __DATA__, separated by commas or some other separator:
our %MODINFO = (); our @MODULES = (); while () { next if /^#/; # Ignore comments chomp; my ($modulename, @fields) = split /s*,s*/, $_; # Split into fields on commas with optional whitespace push @MODULES, $modulename; $MODINFO{$modulename} = [ @fields ]; } __DATA__ ARS, ARSperl, .*, alpha_dux|(amd64|i386)_rel30 Authen::Krb4, Krb4 CGI, Compress::Zlib, , alpha_dux40 Convert::ASN1 Convert::BER Crypt::CBC Crypt::DES Crypt::IDEA Crypt::SSLeay, , , alpha_dux40…
Little Languages
The next observation is that the data table is sparse: for most modules, the defaults are sensible so there’s no need to specify more than the name of the module. In the majority of the other cases, there’s only one caveat, e.g.: “build Foo in the directory perlFooModule“, “build Bar, but not on Solaris 7 boxen”.
On top of that, we can imagine that in the future, it will be necessary to add other conditions to accommodate oddball modules: “when testing Foo, set $CLASSPATH to /usr/local/oddball-java”, or “Bar‘s tests require human intervention, so don’t make test when running in unattended batch mode”, and so forth.
For this, it’s worth defining a little language. A little language is usually a miniature language within a program, something with syntax and semantics, but not enough expressive power to be a full-fledged programming language, like regular expressions, embedded SQL queries, or the first argument to getopt(). Little Languages allow a programmer to compactly express some idea, often an application-specific one, that would normally take many lines of code to express otherwise.
In this case, let’s define the following syntax: if a line in the data file begins with whitespace, then it is not the name of a Perl module, but a qualifier to the preceding module. The qualifier itself takes the form “<qualifier> <value>“. Thus:
CGI Compress::Zlib nobuild alpha_dux40 DBD::Oracle notest .* nobuild alpha_dux|sun4x_57
Under this scheme, it makes sense to consolidate @MODULES and %MODINFO into one structure. Let’s have the elements of @MODULES be anonymous arrays; the first element is the name of a module, and the second is an anonymous hash that maps qualifiers to values. If we were writing it out, we could write:
@MODULES = ( [ "DBD::Oracle", { notest => ".*", nobuild => "alpha_dux|sun4x_57", } ], );
Multi-Line Records
The first problem we encounter is that <DATA>, since it only reads a line up to the end-of-line character, is no longer guaranteed to read an entire record. The simple loop
while (<DATA>) { # process a record }
is no longer sufficient. You may be thinking that we need to write something like
while (<DATA>) { $c = first character of the next line; if ($c =~ /w/) { # The record continues on the next line read in the next line; } else { # We've seen the entire record Process the record; Put back $c so we can see it in the next iteration of the while loop } }
but there’s a much simpler approach: at thist point, we’re not building the modules; we’re just collecting information about them. This means that we can add information to a module that we’ve already seen. So we can just remember the last record we’ve seen:
our @MODULES = (); our $lastmodule; # Reference to last module seen while () { next if /^#/; # Ignore comments chomp; if (!/^s/) { # This is (the beginning of) a new module push @MODULES, [ $_, {} ]; $lastmodule = $MODULES[-1]; } else { s/^s+//; # Trim leading whitespace my ($qualifier, $value) = split /s/, $_, 2; $lastmodule->[1]{$qualifier} = $value; } } __DATA__ CGI Compress::Zlib nobuild alpha_dux40 DBD::Oracle notest .* nobuild alpha_dux|sun4x_57…
Here, $lastmodule is a reference-to-array. Every time we add an entry to @MODULES (and these entries are references-to-array), we remember the last one we added. If we see a line that begins with whitespace, we can just say “oh, I need to add this information to the last module I saw”. This is a lot simpler than trying to implement lookahead.
Dependencies and Partially-Ordered Sets
The last thing I’ll note is that as currently implemented, @MODULES lists the modules in the order they must be built, but doesn’t say why that order is necessary.
The order comes from the fact that certain modules depend on other modules. They form a partially-ordered set: there are many correct orders in which to build the modules, but they all share the characteristic that DBD is built before DBD::Oracle, that Mail is built before MIME::Tools, and so forth.
Since we talked about adding arbitrary qualifiers, above, it would be nice to add a “requires” qualifier. This would allow us to keep the list of modules in any order we liked, and also to have the machine figure out a right order so we humans don’t have to waste time doing so. It would also make these dependencies explicit.
(Aside: Yes, the Clever Thing would be to read the Makefile.PL for a module and see which dependencies it lists. But in the real world, module authors make mistakes and sometimes forget to list a dependency.)
Under this scheme, instead of keeping @MODULES as a list of modules to build, we can keep %MODULES, an unordered set. Instead of storing a two-element anonymous array with the module name and its qualifiers, we can just have the module name be the key of %MODULES, and the anonymous hash of qualifiers be the value.
To implement the partial ordering, we just need to remember which modules have been built so far. We can do this either by keeping a separate %built hash keyed by module name, or by adding a qualifier to the values in %MODULES: just $MODULES{"Foo"}->{"built"} = 1; after building module Foo.
Now the main loop of the program becomes clear: after constructing %MODULES, go through it and look for a module that a) has not been built yet, and b) has no unbuilt dependencies. Build it and mark it as built. Repeat until there are no more modules to build.
What’s funny to me is that you posted this here and not in your other journal, since it is enititled “Perl BOFH.” 😉
Freaky. Didn’t know about the __DATA__ statement.
Make that this instead. Urgh. Where’s the durn preview button on this crazy thing? 😛
Whereas I didn’t realize that wiki-like notation applied to comments as well (that’s why you got DATA instead of underscore underscore DATA underscore underscore.
And yeah, the lack of a preview button bugs me as well, Actually, a lot of things about the comments bug me. But WordPress 2.0 is out, allegedly with a whole new ultra-modular back-end. So hopefully they’ve either fixed comments to allow previews, nested responses, and captchas, or else they’ve made it easy for someone to implement them with a plug-in.