Prisoner’s Dilemmas, Infinite Games, Zero-Sum Thinking
– Or the manifesto of rational & selfish motivations
for cooperation from lessons in nature, game theory, and businesses
One of the best-selling car salesmen in the US was asked
about his secret to success. His response:
“I am not asking myself how I can sell them a car this time;
I ask instead: how can I be sure they will come to me for their next car?"
And so, he’d even actively down-sell some features to his
customers:
- “you
don’t need the ceramic brake discs on that McLaren if you’re going to be
riding in the city”,
- “the
leather seats wouldn’t be ideal for the temperature of this city”,
and so on. He’d actively gain the customer’s trust and would
be the default person to go to for subsequent transactions. He’d not be the
highest-selling guy for the first 3 years, but in the long run, he was the top
salesman. (Rory Sutherland narrating it here)
The One-Shot Mindset
Contrast that with the scams being run at touristy places –
selling a low-quality good at a price above its worth. Also, consider your
office colleague who when you ask for help, either refuses or actively
sabotages because “by helping you, you’ll get ahead of me in the consideration
set for the limited pool of appreciations or higher appraisals that my company
offers”.
Or an early start-up that zealously guards its ideas and
might even be disheartened by a competitor's viral marketing campaign. Think of
the situation as Zepto/Instamart/Blinkit/your favourite q-commerce app around
2022/2023 or as OpenAI/Gemini/Deepseek/Grok/your favourite LLM AI in 2024/2025.
To be clear, I am using these just illustratively – not saying they actually
considered or acted this way.
What the car salesman doesn't do but the latter set of
examples are guilty of are two critical errors of perspective:
- Zero-Sum
thinking
- maximising
the value from a single transaction or context instead of viewing it as a
longer game (or prisoner’s dilemma over a single iteration versus
prisoner’s dilemma over infinite games).
Zero-Sum Thinking
- “I
can only win if the other loses."
- “There’s
only one chocolate and both of us are competing for it – every piece that
you eat is one less piece for me."
- “The
total payoff/outcome available for the both of us together is zero – if
you gain something, I will lose."
While this might be true in a few contexts – think a single
game of tennis or billiards, or a polar bear chasing you – in most other
contexts, it’s incorrect.
Take someone paying INR 300 for a cup of coffee from Blue
Tokai or Starbucks – both parties are better off (one by paying and the other
by getting coffee).
Or the ‘overcharging’ auto in Delhi or Chennai, refusing to
go by meter, and you reluctantly get on it – again, you are both better off,
agreeing to pay that higher amount to escape from June’s sweltering heat.
So goes for all consensual economic transactions – ‘double
thank-you moments’, if you will (hat tip to Amit Varma – co-host of the
excellent 'Everything
is Everything', among other things).
That Oscar that Martin Scorsese lost out for
Goodfellas/Raging Bull/Taxi Driver/pick-your-fav-Scorcese movie before The
Departed – it either spurred him to try differently to get that coveted Oscar
or, more probably, drove him to realise the many machinations & pointless
chase towards it.
That appreciation your colleague/boss got when he ripped off
your idea – because there was only one appreciation available? Guess what –
there probably is no rule that only one appreciation was up for grabs, anyway!
That colleague did play office politics, and we’ll (probably!) come to that
part later.
Game Theory & Prisoner’s Dilemma – Single Iteration
Before delving deeper into this – a quick aside. Common words often carry certain technical connotation in Economics. And so it is with the concept of 'game' in game theory. They model it to cover anything in which two or more players (or individuals or companies or governments) can take independent actions dictated by certain pre-defined rules. Actions of each player lead to certain defined outcomes for the other. Player B's choices will depend on Player A's choices, B's preferences, and the rules of the game. And, same goes for A's choices. (I am compromising accuracy for simplicity.)
Prisoner’s Dilemma is a classic game theory problem. Put
simply, two prisoners are being interrogated by the police. They are both
accused of having stolen something but there’s no proof with the police. They
are placed in two different cells with no means of communicating between each
other. Police offer them a choice individually:
- ‘Cheat’
on your partner and you get away for free
- Stay
silent but the other guy cheats on you (conveys that you’ve committed the
crime) – you’re in jail for 3 years
- Both
you and your partner stay silent – both are in jail for 1 year each
- Both
of you cheat on each other – each is in jail for 2 years.
Visually, the payoff matrix becomes:
| Note: read this matrix in columns first – i.e., from point of view of A as: “If B stays silent, what should A do: stay silent = 1 year; cheat = 0 years” and vice versa from B’s perspective. |
Viewing this as a single game – the rationally better outcome for each prisoner is to ‘cheat’ (as you either get 0 years or 2 years by cheating; but if you stay silent, you either get 1 year or 3 years depending on your partner's actions). If both cheat, it also leads to an outcome that’s overall worse off for each prisoner (both in jail for 2 years) than if both had stayed silent (both in jail for 1 year).
Prisoner’s Dilemma – Infinite/Repeated Games
However, if you view it as a repeated game – the dominant
strategy flips. It allows for trust, retaliation, and cooperation to emerge.
Both the prisoners can, based on their partner’s behaviour, arrive at the ‘more
optimal’ strategy of staying silent.
Each prisoner now has ‘memory’ of the other prisoner’s
actions. There’s greater motive for each to cooperate now because both know
that if, e.g., A cooperates this time, but B doesn’t, A has the power to
‘retaliate’ the next time. A can take the risk on ‘overall better outcome’ of
staying silent knowing that there’s the ‘next time’.
As a pretty interesting aside, in '70s-'80s, Prof. Robert Axelrod had invited different strategies to play in this repeated prisoner’s dilemma simulation. Over 60 different strategies were received in two different tournaments. Submitted strategies included everything from ‘always cooperate’ to ‘always cheat’ to ‘mostly cooperate but some time randomly cheat’ to ‘always cooperate but once the other person cheats, hold that grudge and continue cheating forever’ and many many other variations. The winner – a simple ‘tit-for-tat’ with a starting position of cooperation. (Link to the original paper from 1980 here.)
The Manifesto for Rational Cooperation:
An interesting observation is that the ‘always cheat’ (aka pure bully) strategy plays such that for any given round, it would either draw level
against the opponent (when the other cheats) or win against the opponent (when
the other cooperates). The tit-for-tat works such that it would either draw
against the opponent or lose. And yet, overall, ‘always cheat’ not only loses against
‘tit-for-tat’ but by a wide margin such that there are many other strategies
that perform better than the 'always cheat' bully.
Why? You could lose the battle and yet win the war, or
because ‘repeated games’ make the otherwise single iteration zero-sum version
into a positive-sum game.
The ‘always cheat’ wins or draws against the opponent but it
reduces the overall pie a lot, while 'winning'! The ‘tit-for-tat’ loses or draws against
the opponent, but it increases the pie significantly. The size of the pie from each strategy is what leads to the respective
outcomes.
In fact, the commonalities between most of the winning
strategies from the simulations were strategies that were:
- nice
(as against nasty) – i.e., strategies that didn’t cheat first (nice)
versus those that did (nasty)
- retaliatory
– i.e., those that were not pushovers (‘always cheat’ will win over
‘always cooperate’ in the short and the long run) and had a mechanism to punish the opponent for cheating
- forgiving – will not hold a grudge to perpetuity. A tit-for-tat holds a grudge for one turn but when the other person switches to cooperation, this switches in the next turn noticing that
- clear
– easy to understand/decode for the other party. ‘Unstable genius’ is hard
to work with.
And that's how we arrive at the actual motivation for this
piece – this is a manifesto for cooperation driven from rationality and
selfishness. A few points:
- We
should view our different interactions as repeated interactions instead of
one-off interactions.
- We
can all be better off together; one can be individually better off without
pulling down the other.
- We're
each better-off collaborating – both at individual and at organisational
levels
- These
are mathematically demonstrable assertions as well as seen through the
millions of years of evolution
So, we should default our approach towards cooperation. On the
first instance of 'cheating' by the other side (e.g., context of office
politics, market competition, good-faith negotiations, international relations)
respond in kind. But, be forgiving – when the other party gets back to
cooperation mode, get back to the cooperation mode.
For Further Interest:
For the mathematically inclined – I'd strongly recommend
checking out this
video by Veritasium. It delves into far more detail and
covers Prof. Axelrod's work (with a surprise visit by the prof himself!)
building it step-by-step, in a far more interesting and visual manner.
For the philosophically inclined – Hinduism, Jainism, &
Buddhism have a concept of rebirths and many many lifetimes, instead of a
single lifetime ending at death or ending in judgement and eternal transfer to
heaven/hell. What kind of behaviours or motivations would that lead to or
explain?
And, for the practically inclined – next time you're in a
meeting, negotiation, facing a colleague's request for help, or in a
competitive situation, it'd be good to consider and frame the situation as:
- am I
viewing this as a one-shot game or viewing it as a repeated game?
- am I
viewing it as a fixed-pie or am I thinking in terms of positive-sum games
where a 'win-win' is possible?
Zero-sum games in real life are far rarer than we think. And the age-old wisdom of "Be nice, be forgiving, but don't be a pushover" is battle-tested across evolution, nature, businesses, and life.
Comments
Post a Comment