Pattern Substitution as Funky Iterator
I have a project in which I have a row of cells, and a number of segments of given lengths, and I need to try out all of the ways in which the segments can fit into the row. If you like, think of it as: how many ways can “eye”, “zygote”, and “is” be placed, in that order, on a row of a Scrabble board?
I’m doing this in Perl, so naturally I’d like to play to Perl’s strengths (pattern matching and substitution) rather than its weaknesses (arithmetic). And I’ve discovered a nifty little hack.
As an example, consider a row of 19 cells, into which we have to fit segments of length 3, 4, and 2, respectively and in that order. There are 20 ways to do this (using X to represent segments, and underscores (_) for blank spaces:
- XXX_XXXX_XX___
- XXX_XXXX__XX__
- XXX_XXXX___XX_
- XXX_XXXX____XX
- XXX__XXXX_XX__
- XXX__XXXX__XX_
- XXX__XXXX___XX
- XXX___XXXX_XX_
- XXX___XXXX__XX
- XXX____XXXX_XX
- _XXX_XXXX_XX__
- _XXX_XXXX__XX_
- _XXX_XXXX___XX
- _XXX__XXXX_XX_
- _XXX__XXXX__XX
- _XXX___XXXX_XX
- __XXX_XXXX_XX_
- __XXX_XXXX__XX
- __XXX__XXXX_XX
- ___XXX_XXXX_XX
As you can see, the general approach is:
- Start by packing all the segments as far left as possible.
- Find the rightmost segment that can be moved right.
- Move that segment one space to the right.
- If there are any segments to the right of the segment that was moved, pack them as far left as possible.
- Go to step 2.
- End if there are no more segments that can be moved.
How to Move a Segment
Moving a segment one position to the right is easy with pattern substitution: a segment that can be moved has a space to the right of it, so it matches
/X+_/
so to move that segment one position to the right, we just move the space from the end of the segment to the beginning:
s/(X+)_/_$1/
Finding the First Movable Segment
As you can see from listing at the top, the rightmost segment that can be moved (almost) always has two spaces to its right: one that allows it to move, and one to separate it from from the next segment. The only exception is when the rightmost movable segment is the last one: in that case, the rightmost movable segment is followed by one or more spaces, followed by the end of the row.
It may be tempting to look for two consecutive spaces with /__/ or /X+__/. But that would find the first set of two spaces. In line 7, this would find the first segment (of length 3). But the rightmost movable segment is the second one (of length 4).
One way to solve this would be to invert left and right in the algorithm: start by packing all the segments as far right as they’ll go, then find the first movable segment, move it one position to the left, and so forth until they’re all packed as far left as they’ll go.
But I’ll leave that as an exercise for the student, because instead we’ll take advantage of Perl’s greedy patterns: when you have something like:
$_ = "abcde"; /(.+)(.+)/; print "$1n$2n";
(one or more characters, followed by one or more characters), there are several ways the string "abcde" can match: "a" and "bcde", "ab" and "cde", and so forth. By default, Perl expressions like .* are greedy, i.e., they try to gobble up as much of the string as possible (while still matching the pattern). So in this example, $1 will be "abcd" (because Perl has tried to put as much as possible into the first pattern), and $2 will be "e", because that’s all that’s left.
Returning to our segments, if we want to find the rightmost set of two spaces, we can use /(.*)__/: the (.*), being greedy, will gobble up anything it can, including sets of two or more spaces, and the __ will match only because there aren’t any sets of two spaces to the right of it.
So, in our first version, the way to move the rightmost movable segment one position to the right is:
s/(.*)(X+)__/$1_$2_/
or
s/(.*)(X+)_$/$1_$2/
(this is just a working draft. We’ll improve on this in a bit).
Left-Packing Segments
In the first listing, note the difference between lines 10 and 11: the first segment (of length 3) has been moved one position to the right, as discussed above. But the other segments have also been moved to the left. Does this require a second pattern substitution?
As it turns out, no: since we moved the rightmost movable segment, that means that by definition, any segments to the right of it couldn’t be moved, which in turn means that they were already packed together as far as they’d go. You’ll note that in line 10, the third segment is already at the right edge, and the second segment only has one space between it and the third segment, so it can’t be moved any further right. So we know that the segments to the right of the rightmost movable segment have to match
/_*((?:_X+)*)$/
(For those who don’t remember, (?:pattern) groups like normal parentheses, but doesn’t make numbered subpattern variables like $1. In this case, we’re interested in the entire set of right-packed segments as a group, and aren’t interested in the contents of that group.)
Moving a segment (or a group of segments) a bunch of positions to the left is just the same as moving a segment to the right, but in reverse: take any whitespace in front of the thing we want to move, and move them to the rear:
s/(_*)((?:_X+)*)$/$2$1/
Putting It All Together
At this point, we know that our line should consist of:
- Some stuff.
- A movable segment with a space to its right, into which it can move.
- Possibly some whitespace.
- Zero or more segments that have been packed as far right as they’ll go.
To move the rightmost movable segment one position to the right, we just need to move the space after it to its front. To move the right-packed segments, we need to reserve one space so that they don’t abut the rightmost movable segment, and if there’s any other whitespace, move it to the end.
One final thing: what does the “some stuff” at the beginning look like? Well, we know that it’s zero or more segments separated by one or more spaces, and possibly some whitespace at the beginning:
/^_*(X+_+)*/
Putting it all together, we get:
s/^(_*(?:X+_+)*)(X+)(_)(_*)((?:_X+)*)$/$1$3$2$5$4/
Finally, how do we know when to stop? Fortunately, Perl’s s/// operator returns a false value (the empty string) when it can’t make any substitutions. So we can use:
$row = "XXX_XXXX_XX___"; print $row, "n"; while ($row =~ s/^(_*(?:X+_+)*)(X+)(_)(_*)((?:_X+)*)$/$1$3$2$5$4/) { print $row, "n"; }
As an added bonus, this pattern does the Right Thing in all cases, so we don’t need to test for separate boundary conditions (like when the rightmost movable segment is also the rightmost segment, i.e., it’ll end up at the end of the row).