I've spent way too much time on Youtube, recently. Part of the reason for this is that the American religious right gives me the serious wiggins. For a long time I've taken a decidedly laissez-faire approach to other people's belief systems, but the religious right have made a serious attempt to politicise science and to force other aspects of their neolithic world view upon the American population. This makes me very unhappy - I like to take my science with an absolute minimum of priesthood and I dislike fundamentalism of any kind, particularly when it displays such an unhealthy fascination with genitalia.
So, genetic algorithms...
I've always found them interesting and I wanted to put together a basic simulation of evolution. A
spherical cow of my very own, capturing the zeitgeist, bobbing on the breeze, buoyed by the hot air rising from the fires of the great Evolution Debate. So, I did what any wannabe internets celebrity would do - I drank a bottle of red wine and wrote a bunch of crap.
And here it is...
def Evolve():
# Timestamp start
Ts = dt.datetime.now()
print("Randomly generating the first generation of organisms...")
# S is the number of mating survivors per generation
S = MaxPop * Survivors // 100
# generation count
Gen = 0
# Our population starts with each organism having a genome full of random crap
#
# Number of organims is 'MaxPop'; genome size per organism is 'GenomeSize'
Pop = []
for i in range(0, MaxPop):
O = Organism(GenomeSize)
Pop.append(O)
# The generation loop
while True:
s = 0
Gen += 1
MatingPool = []
print("Culling the worst performers...")
while s < S:
# Pick any two individuals
Na = random.randint(0, len(Pop)-1)
Nb = random.randint(0, len(Pop)-1)
# Put the better of the two in the mating pool
if Fitter(Pop[Na], Pop[Nb]):
MatingPool.append(Pop[Na])
else:
MatingPool.append(Pop[Nb])
s += 1
# We now have the fittest surviving subset of the
# original population - our mating pool
i = 0
Pop = []
print("Birthing the next generation...")
# Be fruitful and multiply
while i < MaxPop:
if SexualReproduction:
# Pick any two individuals, mate them to produce a child
# mutations are introduced during the mating.
Na = random.randint(0, S-1)
Nb = random.randint(0, S-1)
Child = Organism(MatingPool[Na], MatingPool[Nb])
else:
# Pick a single individual, generate a single
# child with mutations
Na = random.randint(0, S-1)
Child = Organism(MatingPool[Na])
Pop.append(Child)
i += 1
# Timestamp current
Tc = dt.datetime.now()
# Elapsed time
Te = Tc - Ts
PrintOrg(Gen, Te, Pop[0], "First")
if (Gen % 25) == 0:
B = Best(Pop)
PrintOrg(Gen, Te, B, "Best")
if Terminate:
if B.Synthesis == Target.Synthesis:
print("Ideal match - terminating...")
PrintOrg(Gen, Te, Target, "Ideal")
return
There's rather too much of it to inline it here, but the above snippet is the main engine. You can download all of it by right-clicking and selecting 'save'
here, and it's written for
Python 3.3.
I wrote it more as a simple simulation rather than a practical framework for more in-depth work. It deviates from the mechanisms of life in the following ways:
- It lacks start-stop codons; any non-coding base acts as start or stop codon depending on the content.
- It doesn't simulate base pairs - bases are singletons.
- It lacks any analogue for RNA.
- Transcription errors manifest in the child, after mating - this is the sole mechanism by which mutations are introduced to the genome. This doesn't make any practical difference to the total mutation load experienced by the 'organisms'; the mutation rate is a program variable and may be set from the command line.
- It does, nevertheless, encapsulate the idea of a genetic code. Codons are implemented as triplets of the G, A, T, C bases and can therefore describe 84 states per codon. This is currently used to synthesise the 'body' of the 'organism', although it could also be used to describe the bytecode for a virtual machine, or as draw instructions for a genetic art project.
For a test run, I chose the following target string: 'Ozymandias'. Each letter in the target can be chosen from 26 lowercase characters and 26 uppercase, making 52 options. As the name is ten letters in length, I'll make it slightly easier by restricting the match attempt to a random string of mixed case letters of length 10.
This provides me with a figure of 5.74077038e+16 possible permutations when I generate the match attempt; the search space is nearly 60 million billion permutations.
C:\Dev>ga.py -g 300 -p 800 -m 15 -s 90 -t "Ozymandias"
Randomly generating the first generation of organisms...
Culling the worst performers...
Birthing the next generation...
runtime: 0.5; generation 1; size 300:
Culling the worst performers...
Birthing the next generation...
runtime: 0.8; generation 2; size 300:
...
Culling the worst performers...
Birthing the next generation...
runtime: 1.19; generation 25; size 300:
Generation: 25; Runtime: 1.19; Size: 299; Synth: OzymDPXiasUskaY
Genome: G$$A$A$T$$TGA$$C$$TG$CACCT$$TAC...
Culling the worst performers...
You can see that by generation 25, the code has already picked out some of the features of the target word. I chose to run the code with a population size of 800, this means that, so far, it has generated 20,000 attempts at a match.
Generation: 100; Runtime: 6.32; Size: 299; Synth: OVymandia
Genome: AT$CG$TGATA$ATAG$TACA$TTTTCA$$C...
By generation 100, it's nearly there. However, by generation 250, it's still nearly there, because the watchmaker really is blind.
Generation: 250; Runtime: 16.56; Size: 298; Synth: ONymandias
Genome: AT$G$TGATA$AGAG$$TACA$TTTTCA$$C...
By generation 1550... still nearly there, because the watchmaker isn't just blind, he's also rather annoying. The code has backtracked numerous times, trying the same match attempt after earlier discarding it. This is just the way Mother Nature would have done it, so I'm still a warm and fuzzy squirrel.
Generation: 1550; Runtime: 98.13; Size: 293; Synth: OQymandias
Genome: AA$$C$TGAGA$$GAA$TACA$TTTTCAA$$$$$...
*facepalm*
I'm going to watch Farscape. I may be some time.
Generation: 4975; Runtime: 320.42; Size: 280; Synth: Ozymandias
Genome: $$$$TGA$$$$$$$CACTAC$$TTTTCAG$G...
Finally...
The *last* time I ran this, I saw a convergence in roughly 400 generation. This time, it took 5k generations, which means 4 million tries at a match. The probability of getting a
single match with
4 million random tries is approximately 7e-11, or
1 chance in 14 billion. This attempt took five hours and 20 minutes. If I tossed a coin to get the best match, at roughly 300 attempts a second (the rate at which alternatives for the code's 'evolve' mode are tried), it would take
years.
My laptop struggles with this code. The code doesn't support parallel execution and even if it did, I have just two cores. The algorithm used is inefficient and involves a lot of data copying. Nature, on the other hand, is fundamentally parallel. Your body alone is host to something like 10e+14 micro-organisms and they're all feeding, dying, breeding, independently of your attention. While the bacteria are doing their thing, stars, billions of light years away, billions of years ago, are dying. Iron atoms generated in their cores are accelerated to enormous velocities. Some of them are impacting you while you read this. Nature doesn't care that you're not paying attention - she's happening all the time, all around you. Genetic copying goes on all the time, accumulating errors when stars flash in gamma light; when iron nuclei impact; when the radon in the mountain air decays into other elements; because its a warm day; because the soup in which they live contains chemical mutagens; because they've altered their chemical environment to provide just the right level of mutation to optimise the process of natural selection.
Oh, and if you think that my code constitutes evidence for a designer - here's a another thought: much of the code I wrote emulates characteristics of the natural world - you get all of the necessary algorithmic support for the process of evolution free of charge in nature. If I wrote some code to simulate orbital physics, a lot of it would, again, be written to emulate phenomena that take place in the natural world. You
could perhaps argue for a deistic stance wherein God set all of this up and then subsequently let it run free to achieve some ultimate unknown conclusion like so much wonderful celestial clockwork. This, at least in my opinion, is the most intelligent of the alternative theistic stances, but it still sits in the non-science bin because there is:
- No evidence to support the intervention of your God.
- No way to differentiate between the claims you make for your God and the claims they make for their God.
- Plenty of prior evidence for the attribution of poorly understood physical phenomena to some theistic cause, only to discover the natural mechanisms underpinning the said phenomena.
Intelligent people tend to confess when they don't know something, because our world is large and still full of unknowns and wonders and it exists in a much vaster and much less well-understood universe. Invoking a God as an explanation for something you don't understand is another, usually very long-winded way of admitting your ignorance on the topic in question.
Incidentally, in these code runs, I've been looking for a perfect match. In nature, 'good enough' is usually sufficient unto the day. The codons can produce non-coding DNA and do so a reasonable percentage of the time. It takes three bases to make a codon. The mutation mechanism causes deletions, duplications and point mutations. There is selection pressure to conserve genome size; in nature, a bigger genome is more expensive at the cellular level. In a fit of entirely naturalistic caprice, the code selects its breeding pool at random from the departing generation, although it does try to pick the better of two candidates with each addition to the mating pool. Being better than the competition isn't enough - you also need
luck.
I could and have, written simpler GAs which converge a lot faster but my aim in this case was to produce something that mimics natural selection.
If you happen to be a creationist, this is why educated people facepalm. It occurs to the more generous of them that perhaps you've been raised in an intellectual vacuum. It's your choice - you can learn. You can learn at this, fundamental, algorithmic, probabilistic level. At
this level you're dealing with numbers and mathematics and mechanisms, which gives the process a kind of purity that you don't see elsewhere. People can't bullshit you because you can repeat the processes yourself; do your own investigations; draw your own conclusions without the need for external evidence.
You can write your own damned code.
My personal conclusion, after playing with the first genetic algorithm I ever wrote, was simple. Anything with a genome is evolving with each new generation. This is the
function of a genome.
I'm on my second bottle of wine. This code and the words it inspired, took three days to write, so it doesn't count as an implicit admission of alcoholism (honest!). I'm not a huge drinker; this is exceptional for me - I prefer weed and I prefer intelligent company. I have a couple of things left to do with the code. I'd like to add an option to output the data to a file in CSV format for later import to a spreadsheet. I need to add an option to terminate the run on finding a perfect match. Finally, I need to format the output in such a way that the run time values are clearer. I'll probably finish these tasks in the next day or so, so I'm not going to bother with version numbers.
I hope this makes sense to you.