Genetic Algorithms are one of the modern programming concepts which I have fallen in love with. It's simplicity paired with it's effectiveness, make it an optimal solution to many problems.
For some reason, the Geek in me loves the concept of 'reproducing' DNA like structured answers in an effort to get an optimal answer. This clash between basic biology and computer science results in facisination for me.
I first took a gander at devloping my own genetic alogirthm a few weeks ago. My goal was to build a system which was capable of formulating an expression which resulted in a target number.
Obviously, there were some rules:
- Only the operators +, -, *, / are allowed
- Only the numbers 0,1,2,3,4,5,6,7,8,9 are allowed
- The answer must fall within a pre set length of symbols; each operator and number counts as a symbol
- If numbers are back to back in the output, then only the first number will be used and the rest ignored
- If operators are back to back in the output, then only the first operator will be used and the rest ignored
I decided I wanted to utilize Python for building this algorithm as it's fast and succinct to write a program in. I began programming this by creating a lookup table function which would take hex values between 0x0 and 0xd and return the applicable number or operator. 0x0->0x9 is self explanatory, and 0xa -> +, 0xb -> -, 0xc -> *, and 0xd -> /
Then I created a simplistic fitness function which takes the target number, subtracts the calculated number, and takes the inverse.
1 / (target - calculated)
This means, the closer the calculated number is to the target number, the closer its fitness is to 1.
Next I wrote the calculation function which follows the rules I listed above, I won't bore you with these details as they are fairly simple. As well, the code can be found at the end of this blog post.
The next step was to create the initial popultation of organisms. To do this, I simple used Python's getrandbits functions and multiplied the symbol cap by 4 as our hex encoding utilizes only 4 bits.
random.getrandbits(length * 4)
At this point I had to figure out a way to mate organisms and produce their offspring. I decided to utilize the Roulette method as this gives a weighted choice to each organism based upon their fitness level. From there, the organisms chosen are split in half and joined with each other, and then each bit is given a 1 in 1000th chance to mutate. In this case, mutating refers to flipping the bit.
After testing this program for a few weeks and noticing that it always worked, I decided it was time to improve the fitness claculation. The reasoning can be shown as below
The genetic algorithm in its current state would give an equal fitness to both expressions if the target number was 5
5 = 5 2+3 = 5 2+7-4 = 5
So on and so forth. Clearly, one of these expressions is optimal and although the other two are right, the simplist answer is typically the best answer.
I decided to create a modifier as an intermediate step in the fitness function, it looked like this:
modifier = (length - sym_count) / length
If the modifier was <= 0 or > 1, it is set to 1. Then, I changed the output of the fitness function to look more like this:
ratio = 1 / (target - found) return ratio * modifier
What this does is lower the fitness of the answers which utilize more symbols and only slightly lower the fitness of the answers which utilize less symbols.
I spent a day doing tests to see if this revised fitness function showed any improvement. In short, it did. Here's a couple test data sets I created:
Test set 1:
Test set 2:
I hope you found this article interesting or informative. You can find the source code on my GitHub repo here:
Software Engineering Student - U of R
Current: Assistant to Manager of Instructional Tech - U of R
Email 2: email@example.com