The Prisoner's Dilemma - What Is It?

 

The prisoners dilemma (PD) and iterated prisoners dilemma (IPD) have been a rich source of research material since the 1950's. However, the publication of Axelrod's book [AXE84] in the 1980's was largely responsible for bringing this research to the attention to other areas, outside of game theory, including evolutionary computing, evolutionary biology, networked computer systems and promoting cooperation between opposing countries [AXE95, FOG93, GOL91]. Despite the large literature base that now exists (see, for example, [BOY87, DAV92, MAY82, POU92, AXE97]), this is an on-going area of research, with Darwen and Yao [DAR02, DAR01, DAR95, YAO99] carrying out some recent work. Their 2001 work [DAR01] extends the prisoners dilemma by offering more choices, other than simply "cooperate" or "defect," and by providing indirect interactions (reputation).
In the prisoners dilemma you have to decide whether to cooperate with an opponent, or defect. Both you and your opponent make a choice and then your decisions are revealed. You receive a payoff according to the following matrix (where the top line is the payoff to the column).

 

Cooperate
Defect
Cooperate
R=3
T=5
R=3
S=0
Defect
S=0
P=1
T=5
P=1

 


The question arises: what should you do in such a game?


Suppose you think the other player will cooperate. If you cooperate then you will receive a payoff of 3 for mutual cooperation. If you defect then you will receive a payoff of 5 for the Temptation to Defect payoff. Therefore, if you think the other player will cooperate then you should defect, to give you a payoff of 5.


But what if you think the other player will defect? If you cooperate, then you get the Sucker payoff of zero. If you defect then you would both receive the Punishment for Mutual Defection of 1 point. Therefore, if you think the other player will defect, you should defect as well.


So, you should defect, no matter what option your opponent chooses.


Of course, the same logic holds for your opponent. And, if you both defect you receive a payoff of 1 each, whereas, the better outcome would have been mutual cooperation with a payoff of 3. The choices of an individual is less than that could have been achieved by two cooperating players, thus the dilemma and the research challenge of finding strategies that promote mutual cooperation.


In defining a prisoners dilemma, certain conditions have to hold. The values we used above, to demonstrate the game, are not the only values that could have been used, but they do adhere to the conditions listed below.


Firstly, the order of the payoffs is important. The best a player can do is T (temptation to defect). The worst a player can do is to get the sucker payoff, S. If the two players cooperate then the reward for that mutual cooperation, R, should be better than the punishment for mutual defection, P. Therefore, the following must hold.


T > R > P > S


Secondly, players should not be allowed to get out of the dilemma by taking it in turns to exploit each other. Or, to be a little more pedantic, the players should not play the game so that they end up with half the time being exploited and the other half of the time exploiting their opponent. In other words, an even chance of being exploited or doing the exploiting is not as good an outcome as both players mutually cooperating. Therefore, the reward for mutual cooperation should be greater than the average of the payoff for the temptation and the sucker. That is, the following must hold.


R > (S + T) / 2


Playing a "one-shot" prisoners dilemma, it is not difficult to decide which strategy to adopt, but the question arises: can cooperation evolve from playing the game over and over again, against the same opponent?


If you know how many times you are to play, then there is an argument that the game is exactly the same as playing the "one-shot" prisoners dilemma. This is based on the observation that you will defect on the last iteration as that is the sensible thing to do as, you are in effect playing a single iteration. Knowing this, it is sensible to defect on the second to last one as well; and this logic can be applied all the way to the first iteration.


However, this reasoning cannot be used when the number of iterations is infinite as you know there is always another iteration. In practise, you simply do not know when the game will end.


Experiments, using human players [SCO63, SCO62, SCO60a, SCO60b, SCO59a, SCO59b] showed that they, generally, did not cooperate even when it should have been obvious that the other person was going to cooperate, just as long as you do. It has been a long term aim to find strategies which causes players to cooperate. If players would only cooperate then their payoff, over an indefinite number of games could be maximised, rather than tending towards defection and hoping the other player would cooperate. In 1979 Axelrod organised a prisoners dilemma competition and invited game theorists to submit their strategies [AXE80a]. The aim being to maximise the payoff of the strategy. Fourteen entries were received with an extra one being added (defect or cooperate with equal probability). The strategies were competed against each other (including itself) over games which lasted 200 moves. The winner was Anatol Rapoport who submitted the simple strategy (Tit-for-Tat) which cooperates on the first move, then does whatever your opponent did on the previous move. In a second tournament [AXE80b], 62 entries were received but, again, the winner was Tit-for-Tat. These two competitions formed the basis of his important book [AXE84].


The prisoners dilemma has a modern day version in the form of the TV show "Shafted" - a game show recently screened on terrestrial TV in the UK (note that this show is not a true prisoners dilemma as defined by Rapoport [RAP96], but does demonstrate that the ideas have wider applicability). At the end of the show two contestants have accumulated a sum of money and they have to decide if to share the money or to try and get all the money for themselves. Their decision is made without the knowledge of what the other person has decided to do. If both contestants cooperate then they share the money. If they both defect then they both receive nothing. If one cooperates and the other defects, the one that defected gets all the money and the contestant that cooperated gets nothing.
Although the prisoners dilemma, in the context of game theory, has been an active research area for at least 50 years [SCO63, SCO62, SCO60a, SCO60b, SCO59a, SCO59b] (it can be traced back to von Neumann and Morgenstern [VON44] and, of course, John Nash [NAS53, NAS50]), it is still an active research area with, among other research aims, researchers trying to evolve strategies [ORI00] that promote cooperation.


Recent research has also considered the prisoners dilemma where there are more than two choices and more than two players. Darwen and Yao have shown that offering more choices leads to less cooperation [DAR01], although reputation may help [DAR02, YAO99]. Birk [BIR99] used a multi-payer IPD. His model had continuous degrees of cooperation (as opposed to the binary; cooperate or defect). He used a robotic environment and showed that a justified-snobism strategy, that tries to cooperate slightly more than the average, is a successful strategy and is evolutionarily stable (that is, it cannot be invaded by another strategy). O'Riordan and Bradish [ORI00] also simulated a multi-player game where the players are involved in many types of games. They show that cooperation can emerge in a high percentage of 2-player games.


As well as the academic papers on the subject, there are many books devoted to game theory and/or the prisoners dilemma. The 1997 book by Axelrod [AXE97] re-produces a range of his papers (with commentary) ranging from 1986 through to 1997. The papers consider areas such as promoting cooperation using a genetic algorithm, coping with noise and promoting norms.