Some problems in software development are easy. Some problems are hard. Genetic algorithms can be easy to implement to solve hard problems for you!
A Gucci Algae what now?
A genetic algorithm is a type of heuristic algorithm. This is just a fancy way of saying it's a procedure used to obtain possible deterministic results. Or in simpler terms, sometimes you have a hard problem, and you may want some of the following:
- A solution that at some point is "good enough"
- You want to have many possible solutions / options
- You have limited computational power
- You have limited computational time
- You don’t even know if a “correct” solution exists or is possible given a set of parameters
Heuristic algorithms have been used traditionally to tackle so-called "NP-Complete" problems (aka, hard problems for computers to solve). The traveling salesman problem, nurse scheduling, and the N queens problem are few common ones in the field.
Genetic algorithms use a process of simulated evolution to achieve results. With things like survival of the fittest, breeding, and mutating.
But enough definition, let's dig in.
The Problem
We've decided to open up a tourism business. We are opening tours via Airship through mountainous terrain (forget all that stuff about the Hindenburg) and we want to offer the smoothest sailing possible to our customers.
We'll be taking off at a certain location, and landing at another. A point to point route. Now, the landscape we'll be flying over has varying degrees of turbulence thanks to those pesky mountains that just won't move.
Let's lay out a map:
-------------------------------------
| X | 1 | 4 | 2 | 2 | 1 | 3 | 4 | 1 |
| 1 | 1 | 4 | 2 | 3 | 3 | 2 | 4 | 2 |
| 2 | 2 | 3 | 2 | 1 | 1 | 1 | 1 | 3 |
| 1 | 1 | 4 | 2 | 2 | 1 | 3 | 4 | 1 |
| 3 | 2 | 2 | 4 | 3 | 2 | 3 | 5 | 4 |
| 1 | 1 | 4 | 2 | 2 | 5 | 3 | 4 | 1 |
| 4 | 1 | 1 | 3 | 2 | 1 | 3 | 2 | 2 |
| 4 | 1 | 1 | 3 | 2 | 4 | 2 | 4 | E |
-------------------------------------
Our Airship will be taking off at point 'E' -- for embark. We will be going to point 'X' -- for Xylophone.
The numbers represent the amount of turbulence the skies are experiencing right now. 5 being vomit comet level, and 1 being grandma's Cadillac with the soft suspension.
Our task as the pilots of the Airship is to get from 'E' to 'X' in *the most efficient way possible.
Couldn't I just brute force this?
Yes, you could. But now imagine the grid was 64x64. How about 500x500? What if there were more variables than just turbulence that affect the passengers experience?? Attractions, staff favorites, what if all the passengers voted on their favorite quadrant they wanted to see?
These are all variables that *could* go into our genetic algorithm to produce an optimized result. But for the sake of this post, we're going to keep it simple with just turbulence in our small grid.
Chromosomes
Genetic algorithms are called genetic algorithms for a reason. They are modeled after genetics and the process of evolution. That's right, before your eyes you can watch a "solution" "grow" to be more mature until it makes you proud, goes off to college, and eventually pays for your condo in that Florida retirement village while you flip through the latest edition of AARP magazine.
A chromosome is a representation of a solution.
What this chromosome actually 'looks' like depends on your problem and what you find best to represent your solution. Depending on what language or assets you have at your disposal this may be a simple array of 1's and 0's. You could model your solution as a list of objects. Or, as in the engineering industry, a fully fledged 3d model.
Let's represent our solutions as an array of cardinal directions.
One possible solution then to get us from E to X is:
[N, N, N, N, N, N, N, W, W, W, W, W, W, W, W]
If you follow our map, and read left to write, our Airship would depart from E, travel to the north east corner and then straight west to Xylophone!
You may stop and realize that this solution is silly and doesn't make a lot of sense. And for a chromosome *thats great!*.
Fun fact: An individual component of a chromosome is called (you may have guessed it) a gene! N, E, S, W are all genes in our scenario.
An important note here is that chromosomes when *first* created are *supposed* to be random* (within reason of your domain). So in implementation we would use a random library to select one of our 4 gene's for each of the positions in the chromosome.
Speaking of positions in the chromosome, you may notice that there are some paths that visit more coordinates than others, and maybe even our most optimal path visits more coordinates than our example one above. This is no problem for a genetic algorithm! To solve our problem we will just have to use variable length chromosomes. Some problem domains have the advantage of using fixed length chromosomes where certain parameters deem it possible.
Fitness
Now, I just labelled our solution "silly". But how does a computer know it's silly? This is the role of "fitness." Fitness is a measurement of *how good* your solution is. This is often what makes or breaks a genetic algorithms performance (both in terms of quality, and execution time). Given a chromosome, we apply a fitness function to determine its quality. This is almost always going to be an integer value. This allows us (and the algorithm) to know which solution is better than the other. So given a lineup of all possible solutions we can definitively say one solution is better than the other.
How would we calculate our fitness? Well, we want the path with the lowest amount of turbulence. So if we were to trace our path given our chromosome we would visit the following values:
2 1 4 1 3 2 1 4 3 1 2 2 4 1
Calculating our fitness is easy! (In what some experts call "the real world" this can often be very, very difficult, and in the hardest problems sometimes this value is even approximated). We add all the values of turbulence together to get a grand total of 31 "turbulence points". If, we stumble upon a chromosome with lower turbulence points, we know we've found a smoother ride for our guests. (Some genetic algorithms will normalize the fitness values, with 1 being the optimal, for ours, we can keep it simple and can say the least is the most optimal).
Population
This is where nature comes in. Our solution wasn't very good. But what if we had 20 solutions! We've increased our chances of having an optimal solution.
The population in the genetic algorithm is simply that. A number of solutions that we want to track at a given time. There are pluses and minuses to having large vs small population sizes, but again, as an introduction, let's not worry about it, and say we have 20 chromosomes.
So now we've got 20 immature chromosomes hanging out together making jokes about cutting cheese, how do we proceed? We, after all, want a gouda solution to take home the cheese. What we do now, is enter into the primary life cycle of the genetic algorithm, starting with, selection.
Selection
Selection is when we line up our immature solutions in a dodgeball line and pick a handful of them for more processing. There are many ways to select chromosomes from a population, and each has some merit depending on the domain.
We may select some randomly in proportion to how fit they are (roulette wheel). Or we may feed small batches of them into subroutines to see which ones are prime for picking through a no holds barred gladiator deathmatch with Darwin watching from the sidelines (tournament selection).
The amount to be selected, again, depends on problem domain, performance, randomness, and very high degrees of math (I don’t really *do* the math thing).
Now that we've gotten our chosen laureates we throw out the rest of the population. Bye Felicia!
Crossover
Let's keep it PG. The selected chromosomes have proven to be valuable. The cream of the crop surely. And just like evolution, we "breed" the surviving and more fit solutions to produce more solutions! This is done through a crossover function. We take two chromosomes, let them bird and bee, and get more chromosomes out! Just like selection, there are many methods of doing this. Some more cut and dry than others. The most basic of crossovers is the one point crossover.
We choose 2 parents. We choose 1 shared point in their chromosome to swap genes. And thus, dirty dancing occurs. Let's look at a minified example with two parents of equal length for simplicity:
Parent 1: [ N, N, N, N ]
Parent 2: [ S, S, S, S ]
Let's say our crossover point were at index 1. This would produce two children:
Child 1: [S, N, N, N ]
Child 2: [N, S, S, S ]
No storks needed.
Again, this is a very basic crossover method and many, many more complicated and involved ones exist.
Mutation
Our final step in the line is to mutate. We want some freaky chelonia's up in here. We again borrow from mother nature and introduce random mutation. Like a wave of radiation from the sun, we go through our population of chromosomes and *randomly change genes*. North becomes south, south may become west. The random library is to decide. The actual chances an individual gene may change are typically* very low (1-3% per gene).
This is an *extremely* important step in the genetic algorithm process. This allows the population to stay diverse and allow the population to reach more of the search space and prevent stagnation or local optima. (Again, short post, I would love to talk about search spaces and local optima, but I'll leave that to the reader).
Generation
We've done it! We now have a full population that is ready to go through the entire cycle again of fitness -> selection -> crossover -> mutation. One cycle of this process is called a generation! (Clever, right!).
When do I grab my overhead luggage and depart the algorithm?
Deciding *when* to stop a genetic algorithm is up to the implementer. Maybe you want to set a hard generation limit. ie. "Run 100 generations". Maybe it's time. ie. "Run for 30 minutes". Maybe it's based on other factors or variables. "Stop when you've either found a usable solution or stop if the best solution hasn't improved in 20 generations."
Flexibility
The entire process of genetic algorithms is flexible. Maybe your problem domain allows you to shortcut a process, maybe you can optimize solutions through custom selection, crossover, and mutation functions. It's up to you, (and a lot of people with PhD's in this kinda stuff) to experiment, move fast, and break things. "Genetic Algorithms'' as a whole are really more about the guidelines than the actual rules, so do what you find produces the best results!