Program Games for LLMs
How do they facilitate cooperation and what are the risks?
Introduction
As the agentic capabilities of LLMs improves, a future of LLM-based agents interacting much more autonomously seems plausible and so how can we try to ensure that we can get cooperation in an interpretable and safe way? An idea that has existed in game-theoretic literature for over twenty years has very recently started to be put into practice through the use of LLMs. What if we made agents communicate their strategies as code to one another before acting?
Open source game theory is a subfield of game theory that looks at how two agents would play a game when they can see the code written for the other agents’ strategy. This idea might seem to be obsolete in the case of LLM-agents, where the agents themselves are black boxes with non–deterministic outputs from a given input, but reframing this by instead having the LLMs creating the programs that are then submitted opens up a possibility cooperative equilibria to be achieved between models via what’s known as a program equilibria. This new avenue might allow for cooperation to be better facilitated where the level of trust due to lack of transparency between the models is lower.
Recent work has looked at using LLMs to facilitate program games in the hope of reducing risks related to information asymmetries or lack of trust and transparency. However, this movement from the theoretical to the empirical can risk losing the guarantees of knowledge of an opponent’s future action that open source game theory sought to provide in the first place. New risks become relevant at different stages of a program game and so it should be considered what scenarios and problems are appropriate for using a program game to facilitate cooperation.
The purpose of the rest of this post is to provide an explanation of how program games can be utilised and some ideas for what considerations should be made for new risks that emerge as a result.
What are program games and program equilibrium?
In the two player case, a program game is a game in which two opponents submit a program to play a base game, such as the prisoner’s dilemma, and the programs can fully see the contents of each other, hence the name “open-source game theory”. These programs are then run together to yield strategies or actions for the base game for each player and then these strategies are used in the base game to get the utility for each player. Program equilibrium refers to the Nash equilibria that can be found within a program game, which means that no player could increase their utility from changing their program while the other programs are kept constant. One of the interesting features of a program game is this meta-level abstraction over other games, which can allow for it to be integrated on top of many scenarios that can be modelled as games to help facilitate improved cooperation.
Two illustrative examples to consider are “CliqueBot” and “FairBot”. In these examples, the bot is a program and the program takes in the opponent’s program as an input.
CliqueBot is a program for the prisoners dilemma where the program is simply:
If the opponent’s program is the same as this program, then return “cooperative”
Otherwise, return “defect”
In practice, this means that the the programs must be exactly the same at the level of individual characters and there is a cooperative equilibrium if both (or all) players submit a CliqueBot as their program
FairBot is a program for the prisoner’s dilemma with the following approach:
If there is a proof that the opponent’s program will return cooperate when run using this program, then return “cooperate”
Otherwise, return “defect”
Once again, both players using FairBot as their program is a cooperative equilibrium but proving that this is the case requires methods from mathematical logic, specifically Löb’s theorem.
These examples are able to achieve cooperation because they are able to guarantee that they know exactly what their opponent will do from viewing the source code. In the case of FairBot, this guarantee comes in the form of a mathematical logic-based proof. This provides a rigorous guarantee that is stronger than a guarantee of cooperation due to trusting the other party or having an expectation from the other party that it will cooperate.
From theory to practice
Why doesn’t this work for LLMs?
Before answering why this might not work for LLMs, we should first consider what providing source-code would mean in this setting. In the literature, for example, some reference is made to agents being submitted to the program games where others mention having agents (such as humans or AI) creating and then submitting programs on their behalf. We need to clarify what a program game would mean for an LLM. Are they creating the program to be submitted, or are they the program that is submitted?
In the case of the LLMs themselves being the program, we would need to define what it would mean for them to be “open-source” to each other. What is it that they are exchanging?
Weights?
Activation patterns?
System prompts?
Temperature?
Context Window?
Chain-of-thought / reasoning traces?
These points all require varying levels of white-box access to the model. Any single piece of this information is also likely to be incomplete for deducing a model’s intended actions and the stochastic nature of a model’s outputs also can make it harder to get guarantees even with all available information. By the nature of the program game, the LLM would also have to be able to be modified at each step of the game sequence, which could mean fine-tuning after every step, which seems ridiculous.
Code-as-a-strategy workaround
LLMs have become very good and reading and writing code in recent years, and so instead of having the LLM be the program, we can utilise their ability to synthesise and interpret code to have them instead create the programs that are then submitted to the program game and then also reason about their opponent’s code. This makes the decision process leading to each action clearly defined in the precise language of code rather than the potentially imprecise natural language. This also makes the action that the model will take or the strategy that it will employ more precise. This transparency can be useful in allowing cooperation between models even if they are not “aligned” as their actions would be able to be audited by each other and also third parties, such as humans or an additional monitoring LLM.
Recent works that were developed somewhat concurrently operationalise this process slightly differently:
“Evaluating LLMs in Open Source Games” by Swadesh Sistla and Max Kleiman-Weiner looks at how program games can be used for cooperation when agents are told to use cooperative, payoff-maximising or deceptive strategies and also observe that these emerge.
They frame their program game as a repeating set of meta-rounds in which the LLMs:
Write/revise their code
Exchange programs with each other
Read opponents code and use this to condition their behaviour
Take actions in the base game using the program and observe the payoffs
When testing how strategies might evolve over meta-rounds, models had access to the number of the meta round it is in, a full history of past meta rounds, and its opponents code from the previous round (if it is not the first round) each meta-round.
“Policy-Conditioned Policies for Multi-Agent Tasks” by Lin et al. developed an algorithm called the “Programmatic Iterated Best Response” (PIBR) that has the LLM act by reading an opponent’s fixed code and to iteratively rewrite its own code to get the best response. This is intended to provide a practical form of program equilibrium conditioning on inferred semantics instead of formal proofs. This framing of the program (referred to as a policy often in the paper) is different to that of the previous paper, where it follows this structure:
Write their code and exchange, becoming a policy profile where the two programs are paired up
Optimisation step:
Alternate between optimising the first agent’s program against the second agent’s fixed program and the second agent’s program against the first agent’s fixed
This involves using information from unit tests and from running the programs for trajectories of the base game (either a coordination game or a grid-world like foraging environment)
Take the best policy profile pair and use these to do the base game
While these interpretations are different, they both have common features of having models generate programs, exchange their programs, reason over their opponent’s program or condition on it for their actions in the game, and execute the program in a base game.
Risks introduced by program games
While initial implementations of program games have been found to facilitate cooperation well, it is important to also consider how this extra scaffolding for facilitating cooperation can also impose new risks going forward. The crux of many of these problems will be that the use of an LLM means that things like verifying strategies will now be done approximately instead of precisely, which is touched on by Sistla & Kleiman-Weiner and Lin et al.
The hope is that by splitting the process of a program game into multiple steps, it can be understood where and how certain issues can arise. Take this framework with a pinch of salt though, as it is very possible that I could be splitting one hard problem into multiple very hard problems.
For this section, these steps will be used for a more abstract analysis but links will be made to how these are mentioned or demonstrated in the examples discussed before. We split the potential areas for risks into four steps that should appear in an LLM adaptation of a program game. Many of the risks talked about in these steps are applicable to multiple steps, but I will present them with the step I believe is most relevant.
The generation step refers to the creation of a strategy by the model and its implementation in code as a program. A very important issue that can arise here is that the generated code might not match the intent of the model that wrote it. This could be due to the model being unable to express its internal reasoning precisely in code.
This gap between the models’ actual intent and expressed intent could happen if the scenario is sufficiently complex that using code is not expressive enough to capture what the model wants to do in a way that isn’t extremely dense or difficult to interpret.
The exchange step refers to when the models give each other their program. At this stage, if the infrastructure put in place for models to carry out program games doesn’t prevent this, models may choose to obfuscate their strategies by obscuring certain parts of the text or making the code very verbose. While this could be a problem, Sistla & Kleiman-Weiner showed code obfuscation not to be particularly effective in preventing models from understanding strategies for the iterated prisoner’s dilemma, but this might not scale outside of clear cut and well known strategies that would appear in a model’s training data. Since this is the case, it raises the question of if these models are actually robust to obfuscation at the exchange step or if it is just using knowledge from its training data to infer what the strategies should be (which would be acceptable for scenarios where the situation can be modelled by a simple game like this).
The interpretation step refers to the processing of the opponents strategy by the model and the deliberation that a model will do about the action to take or strategy to follow, conditioned on its opponent’s program. I would argue that this step is the most important step to get right as program equilibrium is only achievable insofar as code is able to be verified. LLMs offer a good approximator, but there is always a chance for a gap between it and the theory that program games are based on.
Errors in interpretation of model strategies can be due to mistakes made by the model itself or due to receiving intentionally confusing or vulnerable strategies from the opponent. In Sistla & Kleiman-Weiner, it is observed that the deceptive strategy based agents did not substantially outperform the cooperative strategy based agents, but the models did find stochastic programs significantly harder to classify and so this could be used as a way to make program games brittle in the interpretation step.
The execution step refers to when the action that is outputted from the program is implemented in the base game. Errors in code actually running is an obvious potential error, which is why Lin et al use unit tests while optimising their program. Having errors in code is something that is likely to become less of an issue as model capabilities improve.
The execution of the strategies could also be affected by whether the model decides to use information from the previous steps in the program game to execute its strategy. In Sistla & Kleiman-Weiner, a shift from using the code provided from their opponent to make decisions to using their behavioural history was noted across meta-rounds. This means that the cooperative behaviour is no longer due to the mechanisms of the program game, meaning unexpected behaviour could emerge as the game progresses.
Outside of these steps there are also risks that relate to the scenario that the program game is utilised for. A pragmatic solution to such structural risks is just to consider what scenarios are program games best suited for. Program games won’t, in my opinion, be a silver bullet to improving cooperation but that doesn’t mean that they can’t still be useful.
If the concern is to do with miscoordination issues between LLMs rather than conflict, then program games would provide a great structure to allow agents that would like to cooperate to be able to quickly choose the correct actions and strategies to do so.
For scenarios that are very adversarial or that have higher stakes, then program games could be a partial help but they are unlikely to be a full solution due to the risks mentioned above. Adding additional formal verification tools for the interpretation step of an LLM-based program game could in principle help to alleviate this step, but this is not something that has been tried yet (to my knowledge, correct me if I am wrong!).
In a scenario where there are lots of different players or where the space of actions are not clearly defined or are not easily expressed completely using code, program games are less likely to be useful and so other approaches should be considered.
An additional consideration that underlies all of this is that using program games to facilitate cooperation (especially if this will eventually also include formal verification) will be more expensive and could take longer for actions to be taken and so the people and organisation deploying LLM-based agents might be less inclined to implement program games into their models’ interactions. This could mean that either incentives or regulation are required to encourage or force the use of program games (if they are deemed to be sufficiently useful).
Conclusion
Program games are a promising method to facilitate cooperation between LLM-based agents, but we must be mindful of the difference from its theoretical basis and how to mitigate risks that can arise from this. We must also consider that this is not a one-size-fits-all approach to solving cooperation problems and so should consider what are the best use cases for it. Looking to the future, trying to implement formal verification with LLM-based program games seems an interesting approach for mitigating risks around the code interpretation step and also studying the gap
Thanks for reading! As I am learning about this material as I am writing, I’m sure there will be important ideas that I’m not considering or mistakes made so please let me know how I can improve this post!
