Saturday 5 January 2013

GitHub And Other Tales...

I've set up a repository for my python stuff on GitHub. I've made a large number of changes to the GA code, including a fix for the embarrassing maths error pointed out by the  kind soul who inspired this journey into madness. I need to update the relevant post to reflect the fix and to address my other misdemeanours. However, I'm currently documenting the basics of how to run the code under Windows and UNIX, as this has proven to be an issue.

I've added options to record CSV run data for the sexual and asexual evolution modes, allowing one to import the data into Excel or LibreOffice Calc.

I've been slow to deal with this, in part because I needed a change, and partly because I've also been trying to:

  • Get  Plan 9 running under VirtualBox with a WXGA display. 
  • Play with the new Linux version of Steam. I chose to try Team Fortress 2. After initial problems caused by the ATI driver not supporting a requisite OpenGL feature, I found a workaround .
  • Fix the sound problems on the latest 227i patch for UnrealLinux - this turned out to be a configuration issue.
  • Fix the sound problems on my DarkPlaces install (I recompiled for SDL and it's fine now).
I hope to finish the documentation for the GA code some time today,

Wednesday 26 December 2012

New Server...

I moved everything across to the new server, which was a minor pain in the ass, but now sorted. I'll set up a mechanism to stuff any code I upload into a zip file and publish it. I'm also planning to perhaps install a source code control server. We'll see.

There is no spoon...


My virtual server materialised, after an initial frellup regarding the question of my authenticity. The sign-up process required that I enter a phone number. The phone number field on the form was validated against my postcode, which was a little tragic, given that I have a mobile number and a Skype number, neither of which tie me, geographically to my material locale. Nothing daunted, I entered the Swansea STD code and a random, visually telephonic number.

Hey presto, it worked!

Later that day, I received an email asking for proof of my place of residence, so I quickly assembled a package containing rocks from Mount Snowdon, the festering corpse of a local Christmas reveller, and a retinal scan of someone else's eye that I found behind a dumpster on the way home last night. I figure that the elemental makeup of the corpse should be enough to tie me to this local supernova remnant.

So, yes, I now have a server. I'm going to move my stuffs across from my Google Site site to the new server, and then I'm going to update the javascript and css on this account to point to it.

Meanwhile, I've done a lot of work on the genetic algorithm: it's now *much* faster courtesy of my now rather lower level of python cluelessness, and I added an option to switch between sexual and asexual reproduction, and an option to terminate on a perfect match. I fixed a minor bugette in the fitness function, and introduced a class to manifest the idea of an organism, which makes the code more coherent in terms of the system it's trying to describe. A nicety of this class is that, given a target string, it'll synthesise a genome which could generate it which means that you can contrast the 'natural' genome produced by the evolutionary process and the ideal, generated genome. One is clearly designed, the other is full of dead wombats and ear hair from passing perverts. Oh, and I tidied up the print code...

I've also added a new mutation type - jumping jeans! I mean, genes!

Which reminds me, actually; I need to visit the laundrette to avoid the dread spectre of the aforesaid jumping jeans.

Holy Jumping Squirrels, Batman!

Is there anything this girl canna do? Apart from laundry?

The only thing I really have left to do is to add CSV data logging. Oh, and I have a bug in the mutation rate code which is eluding a fix because with astonishing levels of autistic focus I've been fooling with this shenanigans for god knows how long, and at some point I'm going to have to fall over.

I'm not completely sanguine about the virtual server as it costs about £18 a month, and I can't really afford that. However, I'm viewing it as a marketing cost. Which makes everything fine.

Tuesday 25 December 2012

Is This Thing On? *tap* *tap*

This is a test post. I modified the template on this blog to add some pretty print javascript which autosenses the code snippets it brackets and then does keyword highlighting and other clever stuff. I didn't write it - my own skill at web wrangling is very minimal - I wrote some simple HTML in the distant past. Dinosaurs powered the web back then using giant treadmills and windows PCs carved from granite boulders. Huge, gaily coloured pterosaurs carried cumbersome data packets from hill-top to hill-top. Nature was red in tooth and claw.

While I wasn't looking, the net went and freaked out. These days, everything on the internets is powered by rocket science, night elves riding purple candy unicorns, magic iPads and the sacred sweat harvested from the brow of Saint Steve of Jobs, preserved in a reliquary in the Vatican. I am unworthy.

I'll talk about the pretty print code in a another post. What I'm doing in this one is attempting to employ said pretty print code to enhance the window opened by some html that I borrowed from some other blog. Thank you, other blog person!

However, I have to wait until I get a confirmation email from my new hosting provider - yes, I went nuts and bought web hosting. In fact, I went more than usually nuts and bought a linux virtual server, so now I can ssh into my own web site while dressed in black pvc, pretending I'm Trinity.

Go squirrels!

Monday 24 December 2012

Genetic Algorithms...

I've spent way too much time on Youtube, recently. Part of the reason for this is that the American religious right gives me the serious wiggins. For a long time I've taken a decidedly laissez-faire approach to other people's belief systems, but the religious right have made a serious attempt to politicise science and to force other aspects of their neolithic world view upon the American population. This makes me very unhappy - I like to take my science with an absolute minimum of priesthood and I dislike fundamentalism of any kind, particularly when it displays such an unhealthy fascination with genitalia.

So, genetic algorithms...

I've always found them interesting and I wanted to put together a basic simulation of evolution. A spherical cow of my very own, capturing the zeitgeist, bobbing on the breeze, buoyed by the hot air rising from the fires of the great Evolution Debate. So, I did what any wannabe internets celebrity would do - I drank a bottle of red wine and wrote a bunch of crap.

And here it is...

def Evolve():
 # Timestamp start
 Ts = dt.datetime.now()
 
 print("Randomly generating the first generation of organisms...")

 # S is the number of mating survivors per generation
 S = MaxPop * Survivors // 100
 
 # generation count
 Gen = 0

 # Our population starts with each organism having a genome full of random crap
 # 
 # Number of organims is 'MaxPop'; genome size per organism is 'GenomeSize'
 Pop = []
 for i in range(0, MaxPop):
  O = Organism(GenomeSize)
  Pop.append(O)

 # The generation loop
 while True:
  s = 0
  Gen += 1
  MatingPool = []
  
  print("Culling the worst performers...")
  while s < S:  
  # Pick any two individuals 
   Na = random.randint(0, len(Pop)-1)
   Nb = random.randint(0, len(Pop)-1)
   # Put the better of the two in the mating pool
   if Fitter(Pop[Na], Pop[Nb]):
    MatingPool.append(Pop[Na])
   else:
    MatingPool.append(Pop[Nb])
   s += 1
   
  # We now have the fittest surviving subset of the 
  # original population - our mating pool 
  i = 0
  Pop = []
  
  print("Birthing the next generation...")
  # Be fruitful and multiply
  while i < MaxPop:
   if SexualReproduction:
    # Pick any two individuals, mate them to produce a child
    # mutations are introduced during the mating.
    Na = random.randint(0, S-1)
    Nb = random.randint(0, S-1)
    Child = Organism(MatingPool[Na], MatingPool[Nb])
   else:
    # Pick a single individual, generate a single 
    # child with mutations
    Na = random.randint(0, S-1)
    Child = Organism(MatingPool[Na])
   Pop.append(Child)
   i += 1
  
  # Timestamp current
  Tc = dt.datetime.now()
  
  # Elapsed time
  Te = Tc - Ts
  PrintOrg(Gen, Te, Pop[0], "First") 
  if (Gen % 25) == 0:
   B = Best(Pop)
   PrintOrg(Gen, Te, B, "Best")
   if Terminate:
    if B.Synthesis == Target.Synthesis:
     print("Ideal match - terminating...")
     PrintOrg(Gen, Te, Target, "Ideal")
     return   

There's rather too much of it to inline it here, but the above snippet is the main engine. You can download all of it by right-clicking and selecting 'save' here, and it's written for Python 3.3.

I wrote it more as a simple simulation rather than a practical framework for more in-depth work. It deviates from the mechanisms of life in the following ways:
  • It lacks start-stop codons; any non-coding base acts as start or stop codon depending on the content. 
  • It doesn't simulate base pairs - bases are singletons.
  • It lacks any analogue for RNA.
  • Transcription errors manifest in the child, after mating - this is the sole mechanism by which mutations are introduced to the genome. This doesn't make any practical difference to the total mutation load experienced by the 'organisms'; the mutation rate is a program variable and may be set from the command line.
  • It does, nevertheless, encapsulate the idea of a genetic code. Codons are implemented as triplets of the G, A, T, C bases and can therefore describe 84 states per codon. This is currently used to synthesise the 'body' of the 'organism', although it could also be used to describe the bytecode for a virtual machine, or as draw instructions for a genetic art project.
For a test run, I chose the following target string: 'Ozymandias'. Each letter in the target can be chosen from 26 lowercase characters and 26 uppercase, making 52 options. As the name is ten letters in length, I'll make it slightly easier by restricting the match attempt to a random string of mixed case letters of length 10.

This provides me with a figure of 5.74077038e+16 possible permutations when I generate the match attempt; the search space is nearly 60 million billion permutations.

C:\Dev>ga.py -g 300 -p 800 -m 15 -s 90 -t "Ozymandias"
Randomly generating the first generation of organisms...
Culling the worst performers...
Birthing the next generation...
runtime: 0.5; generation 1; size 300:
Culling the worst performers...
Birthing the next generation...
runtime: 0.8; generation 2; size 300:
...
Culling the worst performers...
Birthing the next generation...
runtime: 1.19; generation 25; size 300:
Generation: 25;  Runtime: 1.19; Size: 299; Synth: OzymDPXiasUskaY
Genome: G$$A$A$T$$TGA$$C$$TG$CACCT$$TAC...
Culling the worst performers...

You can see that by generation 25, the code has already picked out some of the features of the target word. I chose to run the code with a population size of 800, this means that, so far, it has generated 20,000 attempts at a match.

Generation: 100;  Runtime: 6.32; Size: 299; Synth: OVymandia
Genome: AT$CG$TGATA$ATAG$TACA$TTTTCA$$C...

By generation 100, it's nearly there. However, by generation 250, it's still nearly there, because the watchmaker really is blind.

Generation: 250;  Runtime: 16.56; Size: 298; Synth: ONymandias
Genome: AT$G$TGATA$AGAG$$TACA$TTTTCA$$C...

By generation 1550... still nearly there, because the watchmaker isn't just blind, he's also rather annoying. The code has backtracked numerous times, trying the same match attempt after earlier discarding it. This is just the way Mother Nature would have done it, so I'm still a warm and fuzzy squirrel.

Generation: 1550;  Runtime: 98.13; Size: 293; Synth: OQymandias
Genome: AA$$C$TGAGA$$GAA$TACA$TTTTCAA$$$$$...

*facepalm*

I'm going to watch Farscape. I may be some time.

Generation: 4975;  Runtime: 320.42; Size: 280; Synth: Ozymandias
Genome: $$$$TGA$$$$$$$CACTAC$$TTTTCAG$G...

Finally...

The *last* time I ran this, I saw a convergence in roughly 400 generation. This time, it took 5k generations, which means 4 million tries at a match. The probability of getting a single match with 4 million random tries is approximately 7e-11, or 1 chance in 14 billion. This attempt took five hours and 20 minutes. If I tossed a coin to get the best match, at roughly 300 attempts a second (the rate at which alternatives for the code's 'evolve' mode are tried), it would take years.

My laptop struggles with this code. The code doesn't support parallel execution and even if it did, I have just two cores. The algorithm used is inefficient and involves a lot of data copying. Nature, on the other hand, is fundamentally parallel. Your body alone is host to something like 10e+14 micro-organisms and they're all feeding, dying, breeding, independently of your attention. While the bacteria are doing their thing, stars, billions of light years away, billions of years ago, are dying. Iron atoms generated in their cores are accelerated to enormous velocities. Some of them are impacting you while you read this. Nature doesn't care that you're not paying attention - she's happening all the time, all around you. Genetic copying goes on all the time, accumulating errors when stars flash in gamma light; when iron nuclei impact; when the radon in the mountain air decays into other elements; because its a warm day; because the soup in which they live contains chemical mutagens; because they've altered their chemical environment to provide just the right level of mutation to optimise the process of natural selection.

Oh, and if you think that my code constitutes evidence for a designer - here's a another thought: much of the code I wrote emulates characteristics of the natural world - you get all of the necessary algorithmic support for the process of evolution free of charge in nature. If I wrote some code to simulate orbital physics, a lot of it would, again, be written to emulate phenomena that take place in the natural world. You could perhaps argue for a deistic stance wherein God set all of this up and then subsequently let it run free to achieve some ultimate unknown conclusion like so much wonderful celestial clockwork. This, at least in my opinion, is the most intelligent of the alternative theistic stances, but it still sits in the non-science bin because there is:
  1. No evidence to support the intervention of your God.
  2. No way to differentiate between the claims you make for your God and the claims they make for their God.
  3. Plenty of prior evidence for the attribution of poorly understood physical phenomena to some theistic cause, only to discover the natural mechanisms underpinning the said phenomena.
Intelligent people tend to confess when they don't know something, because our world is large and still full of unknowns and wonders and it exists in a much vaster and much less well-understood universe. Invoking a God as an explanation for something you don't understand is another, usually very long-winded way of admitting your ignorance on the topic in question.

Incidentally, in these code runs, I've been looking for a perfect match. In nature, 'good enough' is usually sufficient unto the day. The codons can produce non-coding DNA and do so a reasonable percentage of the time. It takes three bases to make a codon. The mutation mechanism causes deletions, duplications and point mutations. There is selection pressure to conserve genome size; in nature, a bigger genome is more expensive at the cellular level. In a fit of entirely naturalistic caprice, the code selects its breeding pool at random from the departing generation, although it does try to pick the better of two candidates with each addition to the mating pool. Being better than the competition isn't enough - you also need luck.

I could and have, written simpler GAs which converge a lot faster but my aim in this case was to produce something that mimics natural selection.

If you happen to be a creationist, this is why educated people facepalm. It occurs to the more generous of them that perhaps you've been raised in an intellectual vacuum. It's your choice - you can learn. You can learn at this, fundamental, algorithmic, probabilistic level. At this level you're dealing with numbers and mathematics and mechanisms, which gives the process a kind of purity that you don't see elsewhere. People can't bullshit you because you can repeat the processes yourself; do your own investigations; draw your own conclusions without the need for external evidence.

You can write your own damned code.

My personal conclusion, after playing with the first genetic algorithm I ever wrote, was simple. Anything with a genome is evolving with each new generation. This is the function of a genome.

I'm on my second bottle of wine. This code and the words it inspired, took three days to write, so it doesn't count as an implicit admission of alcoholism (honest!). I'm not a huge drinker; this is exceptional for me - I prefer weed and I prefer intelligent company. I have a couple of things left to do with the code. I'd like to add an option to output the data to a file in CSV format for later import to a spreadsheet. I need to add an option to terminate the run on finding a perfect match. Finally, I need to format the output in such a way that the run time values are clearer. I'll probably finish these tasks in the next day or so, so I'm not going to bother with version numbers.

I hope this makes sense to you.

Introduction

This is a haven for the occasional hills of abused code I wish to inflict on the rest of the human race.