This was written in response to a message to talk.origins by Zoe Althrop in which she wrote:
if you can show me a program that is acknowledged to be a program, and this program does not have a programmer, then my hypothesis will be falsified.
I thought it would be interesting to "breed" a program using evolutionary computation. Specifically, I tried to "breed" a program which, when it was run, would print out a copy of itself. I failed at this, but learned some interesting things in the process.
If you've ever watched a "Making Of" documentary about a science fiction movie, you may have been shown piles and piles of proposed and discarded designs for costumes, space ships, aliens, and so forth. This illustrates the central idea behind evolutionary computing (EC): to come up with a good design, all you have to do is come up with a million designs, and throw out the bad ones.
There are many ways to write a program that uses evolutionary computing, but they all share the same basic outline:
EC is best suited for problems in which there are too many variables for it to be possible to calculate a definite answer, and where it is possible to arrive at the best answer by successive improvements.
The trickiest part of the whole process is coming up with a way to assign a fitness score to a candidate. The problem is that the algorithm above will generate solutions with the highest score, which may not be what you want. This is like saying that in a democracy, the winner of an election is not the best candidate, but the candidate most able to win an election.
I chose to breed a Scheme program (Scheme is a dialect of the Lisp programming language), mainly because it has a very clean syntax that's easy to generate, partly because one of the traditional projects for students of Lisp is to write a program that prints itself, and partly because I had a Scheme interpreter (SIOD) lying around.
In Lisp, the construct (quote foo)
simply
returns foo without evaluating it. The (list a b
c)
function takes its arguments and makes a list out
of them (Lisp is very big on lists (and lists of lists)).
We can define a function quine
that takes one
argument, and returns a list with that argument, and a quoted
version of the argument:
(define (quine arg)
(list arg
(list (quote quote)
arg)))
So that (quine 123)
would return (123 (quote
123))
. Likewise, (quine (quote quine))
returns (quine (quote quine))
.
So we almost have a program that prints itself. The only snag is
that to make it work, we had to define the function
quine
and this program does not print the
definition of that function.
But what if it were possible to achieve the same thing without
calling quine
by name? In Scheme,
define
simply defines a lambda expression (think of
it as a function body) and gives it a name. Since we don't care
whether our bit of code has a name, we can simply define the
lambda expression
(lambda (arg)
(list arg
(list (quote quote)
arg)))
and use that wherever we wrote "quine
", above. Thus:
((lambda (arg)
(list arg
(list (quote quote)
arg)))
123)
returns (123 (quote 123))
and so forth. Of course,
instead of "123
", we can just quote the body of
quine
and pass that to the function, giving in
effect (body-of-quine (quote
body-of-quine))
, which should return itself. And
here it is:
((lambda (arg)
(list arg
(list (quote quote)
arg)))
(quote (lambda (arg)
(list arg
(list (quote quote)
arg)))))
I chose to write the framework in Perl, for no good reason, except that Perl makes it easy to run other programs (like Lisp interpreters) and read their output.
Thus, I wrote a Perl program that writes Lisp programs, runs them, and evaluates how well they did.
The randomprog
function generates a random chunk
of Scheme code. Actually, this function is quite crude and simply
generates lists of words that appear in Scheme, and lists of such
lists. Much of what it generates is syntactically incorrect and will
be rejected by the Scheme interpreter, later on.
The mutate
function takes a program, picks a
random part of it, and changes it to something else. This "something
else" is a random piece of Scheme code of arbitrary complexity, so
these mutations can introduce sweeping changes.
The score
function runs a candidate program using
the SIOD Scheme interpreter, compares the input to the output, and
assigns a score to the candidate. There is much scope for play here.
The version given here counts the number of times that the nth character of the output matches the nth character of the program, and divides that by the length of the original program. This gives us the percentage of "correct" characters in the output. This is multiplied by 100 and added to the score.
In Lisp, any number, such as 123, is a valid program. Of
course, if you evaluate the program 123
, you get "123",
which matches its input perfectly. However, this is not at all
interesting, so the score
function includes an ad hoc
rule to subtract 200 points from any program whose output includes
only "uninteresting" characters (primarily digits).
An earlier version of this program subtracted a number of points that depended on the length of the program (length1.04, to be exact). This was to avoid a problem with programs growing without bound (see below).
Finally, the main body of the program simply applies the
evolutionary algorithm outlined above: it generates a random program,
creates 100 mutant children per generation, selects the one with the
highest score as the ancestor for the next generation, and repeats.
I did not manage to generate a self-printing Lisp program such as the one described in the sidebar. I did, however, notice some interesting effect.
The most striking result is that this framework generates programs with high scores, and not necessarily programs like the ones I was looking for. I think of this as the "whatever you can get away with" effect.
For instance, earlier versions of the score
function simply counted the number of characters in the output that
matched the corresponding character in the input. The problem with
this was that you are bound to get some matches purely by chance. If a
10,000-character program printed about 10,000 characters of output,
and five percent of the output matched by chance, that program would
have a score of 200. A 150-character program that printed an exact
copy of itself, on the other hand, would only receive a score of 150.
In addition, many types of Scheme programs return something that looks very much like the input. For instance, in SIOD, the program
(lambda (arg) foo)
which defines a function body, returns
#<CLOSURE (arg) foo>
If, by some chance, the program used functions whose names had just the right length, the output would line up exactly with the input, and all of the output would match, except for bits at the beginning and end. This rewards programs for growing without bound: a 1,000-character program might get a score of 9994, even if it did nothing at all, and any mutation that increased the length would increase the score correspondingly.
To mitigate this effect, I added a clause to the
score
function that subtracted the length of the program,
raised to the power 1.04. This is an exponential function designed to
penalize long programs much more harshly than short ones. The exponent
was chosen to be shallow enough to allow some growth, but steep enough
to prevent runaway expansion.
In the other direction, as mentioned above, certain trivial
programs got unexpectedly high scores. Numbers, strings, and built-in
constants such as ()
and t
all evaluate to
themselves. These are, technically, self-printing programs, but
spectacularly uninteresting ones. Thus, I added a clause to the
score
function to subtract 100 points from any program
under 50 characters in length (a penalty for being too short). Later,
I changed this to subtract 200 points from any program that consisted
only of "uninteresting" characters (digits and parentheses, mostly).
Finally, it should be noticed that since the
randomprog
function makes only a token attempt at
producing valid Scheme code, much of what it generates will have some
syntactic error. Any child that cannot be run automatically gets a
score of -10,000.
After the scoring function, the next most important factor was
the type of mutation produced by the mutate
function. We
can imagine many different types of mutation. At one extreme, we could
simply throw out the parent and generate a completely random child
from scratch; this could be useful for exploring new territory. At the
other extreme, we could mutate a parent by replacing a number with a
different number, or a variable with a different variable, but not
changing the overall structure of the parent program; this could be
useful for fine-tuning a solution that already works fairly well.
I used an intermediate approach: the mutate
function picks a random part of the program and either replaces it
with a random piece of code, deletes it, or doubles it. Thus:
Mutation type | Input | Result |
---|---|---|
Replacement | (a b c) | (a (cons (car *pi*)) c) |
Deletion | (a b c) | (a c) |
Doubling | (a b c) | (a b b c) |
In this implementation, each child had exactly one mutation, though it would be simple enough to mutate each child twice, thrice, or some random number.
Originally, I only had the replacement and deletion mutation types. I added doubling partly because it occurs in nature, and partly in the hope that it would lead to the sort of program described in the sidebar. The doubling mutation turned out to be particularly interesting since it started with a piece of code that had already survived the previous generation without causing the interpreter to abort.
This turned out to be very efficient at generating the sorts of infinitely-expanding programs described above, particularly since doubling an existing chunk of code gave better than average odds of having matching characters in the input and output, since the parent candidate presumably already had a higher score than its competitors.
The types of mutation matter because they represent different ways of exploring the neighborhood of a proposed solution. Small changes are good at refining a proposed solution to make it work better. Large changes are good at exploring the range of possible solutions to avoid getting trapped at a local maximum. In many EC frameworks, the system starts with large-step mutations to attempt to find the general form of the best overall solution, and then switches to smaller-step mutations to refine it. My framework does not do this.
It should also be noted that this version of the framework only uses mutation, and not combination (sex). It has been found experimentally that by combining aspects of two or more candidate programs to produce a new one, the child can include the best of both parents and excel in areas in which neither parent does.
However, the programs generated by randomprog
and
mutate
vary so widely that it is not entirely clear to me
how one would combine two programs. This would almost be like trying
to cross-breed sparrows with wasps.
This program failed to generate a non-trivial Scheme program that evaluates to itself. There may be several reasons for this:
It may be that the scoring function did not define an environment where there is a smooth path to the desired result, where each successive change gave a higher score.
It may be that the problem is not amenable to being solve by evolutionary methods. There may be too many local maxima, which give fair to middling scores, but no good way to step from a local maximum to the global maximum without ever getting a lower score. A different scoring function might help here.
It may be that the randomprog
and
mutate
functions produced too many offspring with syntax
errors or other obvious problems, so that it would have taken longer
to generate an optimum solution that I had patience. A system that
made a better attempt at generating valid Scheme code might perform
better in this respect.
Nonetheless, this has proven to be an interesting experiment.