The Prisoner's Dilemma - What Is It?
The prisoners dilemma (PD) and iterated prisoners dilemma (IPD) have been a
rich source of research material since the 1950's. However, the publication
of Axelrod's book [AXE84] in the 1980's
was largely responsible for bringing this research to the attention to other
areas, outside of game theory, including evolutionary computing, evolutionary
biology, networked computer systems and promoting cooperation between opposing
countries [AXE95, FOG93,
GOL91]. Despite the large literature base that now exists (see, for example,
[BOY87, DAV92,
MAY82, POU92,
AXE97]), this is an on-going area of research,
with Darwen and Yao [DAR02, DAR01,
DAR95, YAO99]
carrying out some recent work. Their 2001 work [DAR01]
extends the prisoners dilemma by offering more choices, other than simply "cooperate"
or "defect," and by providing indirect interactions (reputation).
In the prisoners dilemma you have to decide whether to cooperate with an opponent,
or defect. Both you and your opponent make a choice and then your decisions
are revealed. You receive a payoff according to the following matrix (where
the top line is the payoff to the column).
|
|
Cooperate
|
Defect
|
||
| Cooperate |
R=3
|
T=5
|
||
|
R=3
|
S=0
|
|||
| Defect |
S=0
|
P=1
|
||
|
T=5
|
P=1
|
|||
The question arises: what should you do in such a game?
Suppose you think the other player will cooperate. If you cooperate then you
will receive a payoff of 3 for mutual cooperation. If you defect then you will
receive a payoff of 5 for the Temptation to Defect payoff. Therefore, if you
think the other player will cooperate then you should defect, to give you a
payoff of 5.
But what if you think the other player will defect? If you cooperate, then you
get the Sucker payoff of zero. If you defect then you would both receive the
Punishment for Mutual Defection of 1 point. Therefore, if you think the other
player will defect, you should defect as well.
So, you should defect, no matter what option your opponent chooses.
Of course, the same logic holds for your opponent. And, if you both defect you
receive a payoff of 1 each, whereas, the better outcome would have been mutual
cooperation with a payoff of 3. The choices of an individual is less than that
could have been achieved by two cooperating players, thus the dilemma and the
research challenge of finding strategies that promote mutual cooperation.
In defining a prisoners dilemma, certain conditions have to hold. The values
we used above, to demonstrate the game, are not the only values that could have
been used, but they do adhere to the conditions listed below.
Firstly, the order of the payoffs is important. The best a player can do is
T (temptation to defect). The worst a player can do is to get the sucker payoff,
S. If the two players cooperate then the reward for that mutual cooperation,
R, should be better than the punishment for mutual defection, P. Therefore,
the following must hold.
T > R > P > S
Secondly, players should not be allowed to get out of the dilemma by taking
it in turns to exploit each other. Or, to be a little more pedantic, the players
should not play the game so that they end up with half the time being exploited
and the other half of the time exploiting their opponent. In other words, an
even chance of being exploited or doing the exploiting is not as good an outcome
as both players mutually cooperating. Therefore, the reward for mutual cooperation
should be greater than the average of the payoff for the temptation and the
sucker. That is, the following must hold.
R > (S + T) / 2
Playing a "one-shot" prisoners dilemma, it is not difficult to decide
which strategy to adopt, but the question arises: can cooperation evolve from
playing the game over and over again, against the same opponent?
If you know how many times you are to play, then there is an argument that the
game is exactly the same as playing the "one-shot" prisoners dilemma.
This is based on the observation that you will defect on the last iteration
as that is the sensible thing to do as, you are in effect playing a single iteration.
Knowing this, it is sensible to defect on the second to last one as well; and
this logic can be applied all the way to the first iteration.
However, this reasoning cannot be used when the number of iterations is infinite
as you know there is always another iteration. In practise, you simply do not
know when the game will end.
Experiments, using human players [SCO63,
SCO62, SCO60a,
SCO60b, SCO59a,
SCO59b] showed that they, generally,
did not cooperate even when it should have been obvious that the other person
was going to cooperate, just as long as you do. It has been a long term aim
to find strategies which causes players to cooperate. If players would only
cooperate then their payoff, over an indefinite number of games could be maximised,
rather than tending towards defection and hoping the other player would cooperate.
In 1979 Axelrod organised a prisoners dilemma competition and invited game theorists
to submit their strategies [AXE80a].
The aim being to maximise the payoff of the strategy. Fourteen entries were
received with an extra one being added (defect or cooperate with equal probability).
The strategies were competed against each other (including itself) over games
which lasted 200 moves. The winner was Anatol Rapoport who submitted the simple
strategy (Tit-for-Tat) which cooperates on the first move, then does whatever
your opponent did on the previous move. In a second tournament [AXE80b],
62 entries were received but, again, the winner was Tit-for-Tat. These two competitions
formed the basis of his important book [AXE84].
The prisoners dilemma has a modern day version in the form of the TV show "Shafted"
- a game show recently screened on terrestrial TV in the UK (note that this
show is not a true prisoners dilemma as defined by Rapoport [RAP96],
but does demonstrate that the ideas have wider applicability). At the end of
the show two contestants have accumulated a sum of money and they have to decide
if to share the money or to try and get all the money for themselves. Their
decision is made without the knowledge of what the other person has decided
to do. If both contestants cooperate then they share the money. If they both
defect then they both receive nothing. If one cooperates and the other defects,
the one that defected gets all the money and the contestant that cooperated
gets nothing.
Although the prisoners dilemma, in the context of game theory, has been an active
research area for at least 50 years [SCO63,
SCO62, SCO60a,
SCO60b, SCO59a,
SCO59b] (it can be traced back to von
Neumann and Morgenstern [VON44] and, of
course, John Nash [NAS53, NAS50]),
it is still an active research area with, among other research aims, researchers
trying to evolve strategies [ORI00] that
promote cooperation.
Recent research has also considered the prisoners dilemma where there are more
than two choices and more than two players. Darwen and Yao have shown that offering
more choices leads to less cooperation [DAR01],
although reputation may help [DAR02, YAO99].
Birk [BIR99] used a multi-payer IPD. His
model had continuous degrees of cooperation (as opposed to the binary; cooperate
or defect). He used a robotic environment and showed that a justified-snobism
strategy, that tries to cooperate slightly more than the average, is a successful
strategy and is evolutionarily stable (that is, it cannot be invaded by another
strategy). O'Riordan and Bradish [ORI00]
also simulated a multi-player game where the players are involved in many types
of games. They show that cooperation can emerge in a high percentage of 2-player
games.
As well as the academic papers on the subject, there are many books devoted
to game theory and/or the prisoners dilemma. The 1997 book by Axelrod [AXE97]
re-produces a range of his papers (with commentary) ranging from 1986 through
to 1997. The papers consider areas such as promoting cooperation using a genetic
algorithm, coping with noise and promoting norms.