Music and Evo Devo
If you wanted to evolve a musical composition — or better yet, if you wanted to evolve a piece of software that could write music on demand — how would you do it?
I came up with the following approach after reading Sean Carroll’s Endless Forms Most Beautiful.
One problem with evolving music is that the various parts of a composition aren’t independent: there are repeating themes, motifs, and refrains; the different voices should be in the same key. For that matter, the piano and drums should usually be in sync. So before applying evolution to the problem, we should come up with a way of generating a score from a set of (hopefully) more basic variables.
It then occurred to me that this can be reduced to the problem of writing music down as compactly as possible. But that’s what musicians have been doing this for centuries. The key signature allows you to transpose a staff and change the way it sounds, without altering the rhythm or basic tune; once a refrain has been written once on a music sheet, you can just refer to it repeatedly, without having to write it out every time.
In a developing organism, every cell has the same DNA, but the mature body has many distinct regions: head, spine, thorax, digits, retinas, etc. And just as with music, there can be repeated elements, or regions with both similarities and differences. So why not apply developmental biology to musical composition?
A piece of sheet music, if you ignore line and page breaks, is just a two-dimensional array of measures: the vertical axis indicates the various voices, and the horizontal axis indicates time or “execution flow”.
In a living organism, most genes are middle managers: they code for proteins that don’t actually do anything, they just look for certain stretches of DNA, attach themselves, and either cause or prevent the expression of another gene. These promoter and suppressor proteins bind more or less well to patterns in the DNA in front of the gene that they affect. A developing embryo is a complex network of promoters and suppressors: gene A might be promoted by gene B, but then gene C comes along and suppresses production of B, which suppresses production of A. Then gene D suppresses C, which stops suppressing B, so that A is expressed after all, and so forth.
The embryo, in concert with its egg or uterus, also sets up concentration gradients of various chemical markers. One of the earliest such chemical markers is most concentrated where the egg is attached to the uterus, and least concentrated at the other end. The concentration tells the various cells where they are located along the head-to-anus axis.
Still other chemicals travel to adjacent cells to promote or suppress genes in those cells. Follicles should be spaced out, so when the gene that tells a cell “you are a follicle” is expressed, it also triggers the expression of a protein that travels to adjacent cells and turns off the “you are a follicle” gene.
Now apply all of this to musical composition. Start with one measure, an array of integer variables v0, v1, v2, … v1023. It splits vertically into two measures, then those two split vertically again. That gives us four one-measure lines, for voice, lead guitar, bass guitar, and drums. The environment sets v0 in each measure to 0, 85, 171, and 255. This gradient tells each measure which line it’s part of.
The entire array now splits horizontally. The environment maintains a horizontal gradient in v1. With the composition only two measures wide, v1 will have a value of either 0 or 255. v1 == 255 promotes the “you are the coda” gene. These measures will only divide three more times, for an 8-measure coda; the others will divide more often.
After each division, the environment runs all of the code in each measure. The code is like a blackboard architecture: it’s just a bunch of rules of the form “If <condition> then do <code snippet>”. The condition can be something like “v1 > 0 and v16 > 90 and v16 7”. The code snippet is one of a few dozen small pre-written functions that do simple things like “divide vertically”, “divide horizontally”, “set v107 to 0”, “delete yourself”, “choose a melody”, “transpose yourself up one semitone”, and so on.
A lot of these code snippets ought to take parameters, but I’m not sure how to pass them. The simplest approach might be to hardwire them in, so that “transpose yourself” always uses v206 as the number of semitones by which to transpose, “set variable N to M” will always use v212 to get the index of the variable to change, and v213 to get its value, and so forth.
All of the measures use the same code, and this code can be made subject to evolution, including random mutation, and recombination.
This evo devo approach ought to be supplemented with additional techniques. The “choose a melody” function might use a Markov chain to pick the notes to go in a given measure (although there’s no real reason to stop dividing at the measure level; we could keep dividing down to the note level, or something). Or there might be one Markov chain to pick the rhythm of the melody (i.e., whether a measure contains one full note, or four quarter notes, or one half note and two quarter notes, and so forth), and another Markov chain to pick the notes’ pitch.
Ideally, this environment would also evolve the instruments themselves, but this is a much simpler proposition. Just emulate a synthesizer in software, and have the “genome” be the various settings for a patch: attack level and rate, sustain level and rate, amplitude and pitch of the various operators, amplitude and frequency of the LFO, and so forth. Here, though, we should also include some “gratuitous” variables just as a marker. This would allow staves in the composition to “bind” more or less tightly with instruments. That is, a staff might say “I am a lead guitar melody; I’m looking for a lead guitar-ish instrument”, and the environment would pick an instrument from its stable.
What I envision, eventually, is that I’d have a whole “stables” of such composition-generators, which can be interbred and mixed up and shuffled to generate new compositions on demand, as well as a farm of several thousand synth sounds. Obviously, these composers could be traded on the net, and it might be interesting to cross someone’s syntpop composer with a jazz composer.
Update, Mar. 2, 2007: Some further thoughts on this .
Interesting, but how do you measure fitness?
Well, ultimately it has to come down to playing a piece and letting the user rate it on a scale of 1 to 10 or something. Or you could have something like Richard Dawkins’s biomorph program: present the user with a candidate and 8 mutants, and let the user pick the best one to retain in the next generation.
But before that, it should be possible to cull a lot of the crap automatically. You could, for instance, define that you want pieces between 2 and 4 minutes long, You could select for songs with a structure of the form verse, verse, refrain, verse, verse, refrain, bridge, verse, verse, refrain, coda. You could eliminate songs with discordant chords, or ones where the drum part is too irregular.
If you had access to a huge set of MIDI files, you might be able to train a neural network to recognize the music that you like. Then use that NN to judge the output of the composer.
You might be interested in this web page:
http://www.it.rit.edu/~jab/GenJam.html