Prisoner’s Dilemmas, Infinite Games, Zero-Sum Thinking

– Or the manifesto of rational & selfish motivations for cooperation from lessons in nature, game theory, and businesses

One of the best-selling car salesmen in the US was asked about his secret to success. His response:

“I am not asking myself how I can sell them a car this time; I ask instead: how can I be sure they will come to me for their next car?"

And so, he’d even actively down-sell some features to his customers:

  • “you don’t need the ceramic brake discs on that McLaren if you’re going to be riding in the city”,
  • “the leather seats wouldn’t be ideal for the temperature of this city”,

and so on. He’d actively gain the customer’s trust and would be the default person to go to for subsequent transactions. He’d not be the highest-selling guy for the first 3 years, but in the long run, he was the top salesman. (Rory Sutherland narrating it here)

The One-Shot Mindset

Contrast that with the scams being run at touristy places – selling a low-quality good at a price above its worth. Also, consider your office colleague who when you ask for help, either refuses or actively sabotages because “by helping you, you’ll get ahead of me in the consideration set for the limited pool of appreciations or higher appraisals that my company offers”.

Or an early start-up that zealously guards its ideas and might even be disheartened by a competitor's viral marketing campaign. Think of the situation as Zepto/Instamart/Blinkit/your favourite q-commerce app around 2022/2023 or as OpenAI/Gemini/Deepseek/Grok/your favourite LLM AI in 2024/2025. To be clear, I am using these just illustratively – not saying they actually considered or acted this way.

What the car salesman doesn't do but the latter set of examples are guilty of are two critical errors of perspective:

  • Zero-Sum thinking
  • maximising the value from a single transaction or context instead of viewing it as a longer game (or prisoner’s dilemma over a single iteration versus prisoner’s dilemma over infinite games).

Zero-Sum Thinking

  • “I can only win if the other loses."
  • “There’s only one chocolate and both of us are competing for it – every piece that you eat is one less piece for me."
  • “The total payoff/outcome available for the both of us together is zero – if you gain something, I will lose."

While this might be true in a few contexts – think a single game of tennis or billiards, or a polar bear chasing you – in most other contexts, it’s incorrect.

Take someone paying INR 300 for a cup of coffee from Blue Tokai or Starbucks – both parties are better off (one by paying and the other by getting coffee).

Or the ‘overcharging’ auto in Delhi or Chennai, refusing to go by meter, and you reluctantly get on it – again, you are both better off, agreeing to pay that higher amount to escape from June’s sweltering heat.

So goes for all consensual economic transactions – ‘double thank-you moments’, if you will (hat tip to Amit Varma – co-host of the excellent 'Everything is Everything', among other things).

That Oscar that Martin Scorsese lost out for Goodfellas/Raging Bull/Taxi Driver/pick-your-fav-Scorcese movie before The Departed – it either spurred him to try differently to get that coveted Oscar or, more probably, drove him to realise the many machinations & pointless chase towards it.

That appreciation your colleague/boss got when he ripped off your idea – because there was only one appreciation available? Guess what – there probably is no rule that only one appreciation was up for grabs, anyway! That colleague did play office politics, and we’ll (probably!) come to that part later.

Game Theory & Prisoner’s Dilemma – Single Iteration

Before delving deeper into this – a quick aside. Common words often carry certain technical connotation in Economics. And so it is with the concept of 'game' in game theory. They model it to cover anything in which two or more players (or individuals or companies or governments) can take independent actions dictated by certain pre-defined rules. Actions of each player lead to certain defined outcomes for the other. Player B's choices will depend on Player A's choices, B's preferences, and the rules of the game. And, same goes for A's choices. (I am compromising accuracy for simplicity.)

Prisoner’s Dilemma is a classic game theory problem. Put simply, two prisoners are being interrogated by the police. They are both accused of having stolen something but there’s no proof with the police. They are placed in two different cells with no means of communicating between each other. Police offer them a choice individually:

  • ‘Cheat’ on your partner and you get away for free
  • Stay silent but the other guy cheats on you (conveys that you’ve committed the crime) – you’re in jail for 3 years
  • Both you and your partner stay silent – both are in jail for 1 year each
  • Both of you cheat on each other – each is in jail for 2 years.

Visually, the payoff matrix becomes:

Note: read this matrix in columns first – i.e., from point of view of A as: “If B stays silent, what should A do: stay silent = 1 year; cheat = 0 years” and vice versa from B’s perspective.
 

Viewing this as a single game – the rationally better outcome for each prisoner is to ‘cheat’ (as you either get 0 years or 2 years by cheating; but if you stay silent, you either get 1 year or 3 years depending on your partner's actions). If both cheat, it also leads to an outcome that’s overall worse off for each prisoner (both in jail for 2 years) than if both had stayed silent (both in jail for 1 year).

Prisoner’s Dilemma – Infinite/Repeated Games

However, if you view it as a repeated game – the dominant strategy flips. It allows for trust, retaliation, and cooperation to emerge. Both the prisoners can, based on their partner’s behaviour, arrive at the ‘more optimal’ strategy of staying silent.

Each prisoner now has ‘memory’ of the other prisoner’s actions. There’s greater motive for each to cooperate now because both know that if, e.g., A cooperates this time, but B doesn’t, A has the power to ‘retaliate’ the next time. A can take the risk on ‘overall better outcome’ of staying silent knowing that there’s the ‘next time’.

As a pretty interesting aside, in '70s-'80s, Prof. Robert Axelrod had invited different strategies to play in this repeated prisoner’s dilemma simulation. Over 60 different strategies were received in two different tournaments. Submitted strategies included everything from ‘always cooperate’ to ‘always cheat’ to ‘mostly cooperate but some time randomly cheat’ to ‘always cooperate but once the other person cheats, hold that grudge and continue cheating forever’ and many many other variations. The winner – a simple ‘tit-for-tat’ with a starting position of cooperation. (Link to the original paper from 1980 here.)

The Manifesto for Rational Cooperation:

An interesting observation is that the ‘always cheat’ (aka pure bully) strategy plays such that for any given round, it would either draw level against the opponent (when the other cheats) or win against the opponent (when the other cooperates). The tit-for-tat works such that it would either draw against the opponent or lose. And yet, overall, ‘always cheat’ not only loses against ‘tit-for-tat’ but by a wide margin such that there are many other strategies that perform better than the 'always cheat' bully.

Why? You could lose the battle and yet win the war, or because ‘repeated games’ make the otherwise single iteration zero-sum version into a positive-sum game.

The ‘always cheat’ wins or draws against the opponent but it reduces the overall pie a lot, while 'winning'! The ‘tit-for-tat’ loses or draws against the opponent, but it increases the pie significantly. The size of the pie from each strategy is what leads to the respective outcomes.

In fact, the commonalities between most of the winning strategies from the simulations were strategies that were:

  • nice (as against nasty) – i.e., strategies that didn’t cheat first (nice) versus those that did (nasty)
  • retaliatory – i.e., those that were not pushovers (‘always cheat’ will win over ‘always cooperate’ in the short and the long run) and had a mechanism to punish the opponent for cheating
  • forgiving – will not hold a grudge to perpetuity. A tit-for-tat holds a grudge for one turn but when the other person switches to cooperation, this switches in the next turn noticing that
  • clear – easy to understand/decode for the other party. ‘Unstable genius’ is hard to work with.

And that's how we arrive at the actual motivation for this piece – this is a manifesto for cooperation driven from rationality and selfishness. A few points: 

  1. We should view our different interactions as repeated interactions instead of one-off interactions. 
  2. We can all be better off together; one can be individually better off without pulling down the other.
  3. We're each better-off collaborating – both at individual and at organisational levels
  4. These are mathematically demonstrable assertions as well as seen through the millions of years of evolution

So, we should default our approach towards cooperation. On the first instance of 'cheating' by the other side (e.g., context of office politics, market competition, good-faith negotiations, international relations) respond in kind. But, be forgiving  when the other party gets back to cooperation mode, get back to the cooperation mode.

For Further Interest:

For the mathematically inclined – I'd strongly recommend checking out this video by Veritasium. It delves into far more detail and covers Prof. Axelrod's work (with a surprise visit by the prof himself!) building it step-by-step, in a far more interesting and visual manner.

For the philosophically inclined – Hinduism, Jainism, & Buddhism have a concept of rebirths and many many lifetimes, instead of a single lifetime ending at death or ending in judgement and eternal transfer to heaven/hell. What kind of behaviours or motivations would that lead to or explain?

And, for the practically inclined – next time you're in a meeting, negotiation, facing a colleague's request for help, or in a competitive situation, it'd be good to consider and frame the situation as:

  • am I viewing this as a one-shot game or viewing it as a repeated game?
  • am I viewing it as a fixed-pie or am I thinking in terms of positive-sum games where a 'win-win' is possible?

Zero-sum games in real life are far rarer than we think. And the age-old wisdom of "Be nice, be forgiving, but don't be a pushover" is battle-tested across evolution, nature, businesses, and life.

Comments

Popular posts from this blog

De-addiction and Policy Making

Painful List of (Mild) Pretension

The Dope Trail - Pt 3