Risk analysis with a genetic algorithm and TrueSkill

While (data) science is in principle about finding a solution to a problem, sometimes I find myself looking for a problem that suits a certain solution. This post results from such a situation: I was looking for an interesting problem to apply a genetic algorithm to. I had never used any evolutionary algorithm and wanted to play around with one. When Vincent decided to scrutinize monopoly I realized that (board) games make excellent subjects to try out some algorithms. So here we are... genetic algorithms for Risk!

The game of Risk

As you may well know, Risk is a strategic board game for 2 to 6 players that revolves around conquering (a tabletop version of) the world. It was invented in 1957, and has been quite popular ever since. Detailed rules of the game can be found here, but I'll a very short introduction to the game to get you started.

The game revolves around the game board, which is shown below. The board resembles a map of the world, divided into 42 territories spread over six continents, each with a different color. Territories are considered adjacent if their borders touch or if they are connected by a dotted line.

Risk game board

At the start of the game the territories are randomly distributed amongst the players. Each player gets a number of armies to place on his territories. Also, each player is given a mission which he needs to complete to win the game. Missions range from conquering specific continents, a number of territories or eliminating another player.

Then players get to play their turns one by one until a player achieves his mission. A turn consists of three stages:

  • Reinforcement, during which the player places additional armies. The number of armies depends on the territories in his control.
  • Combat, during which the player may attempt to conquer other territories.
  • Fortification, during which the player may move armies.

The odds in combat are decided using dice, where the defending party has better odds unless the attacker has many more armies.

In general, a good strategy is to conquer full continents, as a player that controls one or more full continents at the start of his turn receives extra reinforcements. But of course the game is not so simple and one can have many strategies.

A genetic Risk player

The principle of a genetic algorithm (GA) is based on natural selection and evolution: given a number of solutions, one selects a number of the best solutions and the modifies (mutates) and combines these to form a new generation of solutions.

An implementation of a GA requires a genetic representation of a solution as well as a fitness function to evaluate the quality of the solutions. Neither of these requirements is trivial in the case of Risk.

Genetic representation

Our solutions, the Risk players, need to be designed such that we can perform the basic genetic operations on them: we need to be able to mutate and combine them. The easiest way to allow this is to find a vector representation of the player. A mutation is then simply a change in one or more of the elements (genes) of the vector (genome), and combining can be done by taking one genome and replacing some of the genes by those of another genome.

A simple python implementation of such a genome could look like this:

class Genome(object):

    def __init__(self, genes):
        self.genes = genes

    def combine(self, other):
        return Genome([
            mine if random.choice([True, False]) else theirs
            for mine, theirs in zip(self.genes, other.genes)
        ])

    def mutate(self):
        return Genome([random.gauss() + val for val in self.genes])

The player needs to make decisions based on the genome, of course. For example, it needs to decide where to place armies in the reinforcement phase. We can achieve this by computing some score ('rank') for every territory it can place an army on, and then placing the army on the territory with the highest rank. This rank can depend on many features of the territory, such as the number of armies on it, the number of hostile armies around it and the value of the territory to the players' mission. We can use the genes, the elements in the genome, as weights to the various features that make up the rank.

So, for each army a player may place in the reinforcement phase, we calculate a reinforcement rank \(R_r\) for each territory it can place the army on. This rank is based on various features of the territories, weighted using the genes of the player:

$$R_r = \sum_i (f_i * w_i),$$

where \(f_i\) is a feature of the territory, and \(w_i\) is the weight the player assigns to that feature. Then the army is placed on the territory with the highest rank.

Features of a territory can be, for example:

  • army vantage: the ratio of hostile armies on neighboring territories to the number of armies on the territory in question,
  • territory vantage: the ratio of neighboring territories that are hostile,
  • mission value: whether or not the territory is valuable to the mission of the player. This depends on the mission, of course.

We can implement a player object based on the above implementation of the Genome class:

class Player(Genome):

    def place_reinforcement(self, board):
        return max(board.my_territories(), key=lambda t: self.reinforcement_rank(t))

    def reinforcement_rank(self, territory):
        return army_vantage(territory) * self.genes[0] + \
            territory_vantage(territory) * self.genes[1] +\
            mission_value(territory) * self.genes[2]

Similarly we can construct an attack rank and a fortification rank. Many of the features are shared between the ranks: if for example the mission is important for determining where to place armies it is likely also important for deciding which territory to attack. The weights however should not be shared as the motivations for the actions in each phase are different: when attacking one tries to obtain new territories or exterminate a player, while during fortification one may want to fortify his most valuable territory.

In the attack phase we need not only decide which territory to attack, but also whether to attack at all. This can be done by calculating the attack ranks for all attack options, and only attacking if the highest attack rank is above a certain threshold. The ideal level of this 'attack threshold' can also be determined by the GA by making it one of the genes.

Finally we need to think about the domain of the weights: if all of the weights \(w_i\) are left unrestricted we run the chance that they run off to infinity. It is not so much the value of each weight but the ratio between them that influences the final decision. Hence we need to set an absolute scale. We can do that by restricting one of the weights to the values \(-1\), \(0\) and \(1\). As long as this weight does not converge to \(0\) this will set a scale for the other weights.

Fitness function

Now we have a genetic representation of a player, we need to be able to evaluate it. Of course the 'quality' of a single player cannot be evaluated; and is probably even impossible to define. On the other hand, it is not so difficult to find out whether one player is better than another: we could simply let them play a game and see who comes out as a winner. As Risk is not really suited for two players we can have multiple (for example four) players compete in one game, which has the advantage that we can evaluate multiple players in a single game. Of course there is a factor of chance, but we can have the players play several games to properly evaluate which is (are) the better player(s).

Now this works well if we have only a few (up to six) players. When we are going to evaluate more players, having each player compete with each other player quickly goes out of hand. For a hundred players that would require many hundreds of millions of games.

Luckily, the problem of ranking players of a game has been solved before. Microsoft developed a ranking system called TrueSkill to rank players on Xbox Live. While the exact implementation used by Microsoft has not been made public, a python implementation that mimics its behaviour is publicly available. An interactive explanation of the workings of TrueSkill can be found here.

So using TrueSkill, we can have the players play multiple games in randomly selected groups until we are satisfied with the \(\sigma\)s of the TrueSkill beliefs and then select players based on their \(\mu\)s.

Results

So, does this approach actually work? Yes, it does! See below the distributions of weights (\(w\)) for the territory vantage feature described above. Initially it was evenly distributed over the range \([-25, 25]\), but after a single iteration of the GA we already see it favors the positive side, meaning that it is better to place armies on territories that have a lot of hostile neighbors. After ten iterations there are hardly any players left that have a negative weight for the territory vantage feature, and after forty iterations we see it peaking around a weight just below twenty.

tvantag_weights

Now I could take you through all other features, but I think that would get boring pretty soon. If you want to play around with the genetic player or Risk in general, you can find the implementation I used here. If you do want some more information before having a look at the code, see the talk I gave at Europython.

Mind you, it doesn't always go as smoothly as the above figure would let you believe. For example, in the initial (random) population there are some players which have the above-mentioned 'attack-threshold' at such a high level that they would never attack. If you happened to have a game with only such players, no attack would ever happen and the game would go on indefinitely.

Of course the 'best' player that came out of the algorithm is by no means the best player that could exist. First of all, the algorithm hadn't really converged after 40 iterations (about 12 hours in), so running a little longer could yield (much) better results. Also, most likely some features exist that are much better than those I constructed. And in the end this algorithm is limited in any case since it only makes linear combinations of some constructed properties and it does not have an internal state to keep track of other players' behavior.

Nonetheless it is a fun way to experiment with genetic algorithms!

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.
Follow us for more of this