Reinforcement Learning with Open Spiel (Part 1)


There is a certain game at my parent’s home that I really like. It is similar in spirit to the game of Connect Four, but in 3D. Instead of inserting tokens in 7 columns, a \(4 \times 4 \times 4\) grid is used. The goal remains the same: align four of your tokens to win, be it in a row, in a diagonal …

Since comming to Japan, I was lucky to find a bar in Kobe where the patron has this game. I played a few games against him, and to my great shame, lost more games than I want to admit. But not to despair, I just remembered I am a computer scientist by trade. Surely I should be able to make a program to play in my stead! Join me on my small adventure into Reinforcement Learning and Python programming.

Discovering the Open Spiel framework

As luck would have it, there is a repository available on GitHub called Open Spiel which contains a fair number of games and machine learning algorithms. Almost the whole project is implemented in C++, but there are Python bindings available and it can be installed a Python package via pip.

By installing the open_spiel library through pip and then downloading the source files of the project, we can use existing Python sample programs to launch a game of Tic-Tac-Toe.

$ git clone https://github.com/google-deepmind/open_spiel.git
$ cd open_spiel
$ pip install -r requirements.txt
$ pip install open_spiel
$ python open_spiel/python/example/example.py --game_string="tic_tac_toe"
Registered games:
[<GameType '2048'>, <GameType 'add_noise'>, <GameType 'amazons'>, <GameType 'backgammon'>, <GameType 'bargaining'>, <GameType 'battleship'>, <GameType 'blackjack'>, <GameType 'blotto'>, <GameType 'breakthrough'>, <GameType 'bridge'>, <GameType 'bridge_uncontested_bidding'>, <GameType 'cached_tree'>, <GameType 'catch'>, <GameType 'chat_game'>, <GameType 'checkers'>, <GameType 'chess'>, <GameType 'cliff_walking'>, <GameType 'clobber'>, <GameType 'coin_game'>, <GameType 'colored_trails'>, <GameType 'connect_four'>, <GameType 'coop_box_pushing'>, <GameType 'coop_to_1p'>, <GameType 'coordinated_mp'>, <GameType 'crazy_eights'>, <GameType 'cursor_go'>, <GameType 'dark_chess'>, <GameType 'dark_hex'>, <GameType 'dark_hex_ir'>, <GameType 'deep_sea'>, <GameType 'dots_and_boxes'>, <GameType 'dou_dizhu'>, <GameType 'efg_game'>, <GameType 'einstein_wurfelt_nicht'>, <GameType 'euchre'>, <GameType 'first_sealed_auction'>, <GameType 'gin_rummy'>, <GameType 'go'>, <GameType 'goofspiel'>, <GameType 'havannah'>, <GameType 'hearts'>, <GameType 'hex'>, <GameType 'kriegspiel'>, <GameType 'kuhn_poker'>, <GameType 'laser_tag'>, <GameType 'leduc_poker'>, <GameType 'lewis_signaling'>, <GameType 'liars_dice'>, <GameType 'liars_dice_ir'>, <GameType 'maedn'>, <GameType 'mancala'>, <GameType 'markov_soccer'>, <GameType 'matching_pennies_3p'>, <GameType 'matrix_bos'>, <GameType 'matrix_brps'>, <GameType 'matrix_cd'>, <GameType 'matrix_coordination'>, <GameType 'matrix_mp'>, <GameType 'matrix_pd'>, <GameType 'matrix_rps'>, <GameType 'matrix_rpsw'>, <GameType 'matrix_sh'>, <GameType 'matrix_shapleys_game'>, <GameType 'mfg_crowd_modelling'>, <GameType 'mfg_crowd_modelling_2d'>, <GameType 'mfg_dynamic_routing'>, <GameType 'mfg_garnet'>, <GameType 'misere'>, <GameType 'mnk'>, <GameType 'morpion_solitaire'>, <GameType 'negotiation'>, <GameType 'nfg_game'>, <GameType 'nim'>, <GameType 'nine_mens_morris'>, <GameType 'normal_form_extensive_game'>, <GameType 'oh_hell'>, <GameType 'oshi_zumo'>, <GameType 'othello'>, <GameType 'oware'>, <GameType 'pathfinding'>, <GameType 'pentago'>, <GameType 'phantom_go'>, <GameType 'phantom_ttt'>, <GameType 'phantom_ttt_ir'>, <GameType 'pig'>, <GameType 'python_block_dominoes'>, <GameType 'python_connect_four_3d'>, <GameType 'python_dynamic_routing'>, <GameType 'python_iterated_prisoners_dilemma'>, <GameType 'python_kuhn_poker'>, <GameType 'python_liars_poker'>, <GameType 'python_team_dominoes'>, <GameType 'python_tic_tac_toe'>, <GameType 'quoridor'>, <GameType 'rbc'>, <GameType 'repeated_game'>, <GameType 'restricted_nash_response'>, <GameType 'sheriff'>, <GameType 'skat'>, <GameType 'solitaire'>, <GameType 'spades'>, <GameType 'start_at'>, <GameType 'stones_and_gems'>, <GameType 'tarok'>, <GameType 'tic_tac_toe'>, <GameType 'tiny_bridge_2p'>, <GameType 'tiny_bridge_4p'>, <GameType 'tiny_hanabi'>, <GameType 'trade_comm'>, <GameType 'turn_based_simultaneous_game'>, <GameType 'twixt'>, <GameType 'ultimate_tic_tac_toe'>, <GameType 'y'>, <GameType 'zerosum'>]
Creating game: tic_tac_toe
...
...
...
Player  0 , randomly sampled action:  x(0,0)
x..
...
...
Player  1 , randomly sampled action:  o(2,1)
x..
...
.o.
Player  0 , randomly sampled action:  x(2,0)
x..
...
xo.
Player  1 , randomly sampled action:  o(1,2)
x..
..o
xo.
Player  0 , randomly sampled action:  x(2,2)
x..
..o
xox
Player  1 , randomly sampled action:  o(0,2)
x.o
..o
xox
Player  0 , randomly sampled action:  x(1,0)
x.o
x.o
xox
Utility for player 0 is 1.0
Utility for player 1 is -1.0

In this example.py program, agents that pick an action at random are initialized for each of the players and a game is played. The Utility shown on the bottom two lines indicates who won the game:

  • a value of \(1.0\) means that player won,
  • a value of \(-1.0\) means that player lost,
  • a value of \(0.0\) for both players indicate a draw.

In the game above, the first player (Player 0) managed to align their token in the first column, winning the game.

As shown at the beginning of the program, there are many games implemented in Open Spiel. You can launch a random playthrough of any of the implemented games, just by setting the --game_string flag when launching the program. This also means that if I am able to implement my game according to Open Spiel’s API, I should be able to reuse all the provided example programs with my own game.

Installing Open Spiel locally

Since I want to implement my own game into the framework and that there appears to be a large number of generic programs already integrated into the framework, I decided to add my own program to the library. This means that I need to compile and install Open Spiel via the sources, and not via the package available for install by pip.

$ ./install.sh  # Installs the packages necessary to build the C++ Open Spiel 
$ pip install -r requirements.txt # Installs the necessary Python packages
$ ./open_spiel/scripts/build_and_run_tests.sh # Builds Open Spiel with all the games and runs the tests 
$ pip install . # Installs the compiled Open Spiel Python bindings

When developing my program, I realized I needed to uninstall and reinstall the open_spiel using pip every time I made changes to my program. Otherwise, the “previous” version of the Python programs is kept when launching programs. This is done using the following command:

$ pip uninstall open_spiel
$ ./open_spiel/scripts/build_and_run_tests.sh
$ pip install .

Implementing the game

As luck would have it, there are a number of games implemented in Python. So I was able to take inspiration from these. The 3D Connect Four game is very similar to Tic-Tac-Toe. Both of these games are sequential, two-players, perfect information, zero-sum, deterministic games. This means that the two players play one after the other, there is no hidden information (unlike poker where the hand of your opponents is unknown to you), one player’s gain is the other player’s loss, and there is no chance involved in the outcome of a game (no throw of dices).

This information is important because it is needed by the Open Spiel framework to run your game. Through the Python API, this information is defined in pyspiel.GameType and pyspiel.GameInfo as shown below.

_NUM_PLAYERS = 2
_BOARD_FILES = 4
_BOARD_HEIGHT = 4
_BOARD_RANKS = 4
_MAX_GAME_LENGTH = _BOARD_FILES * _BOARD_RANKS * _BOARD_HEIGHT
_NB_POSSIBLE_ACTIONS = _BOARD_FILES * _BOARD_RANKS

_GAME_TYPE = pyspiel.GameType(
    short_name="python_connect_four_3d",
    long_name="Python Connect Four 3D",
    dynamics=pyspiel.GameType.Dynamics.SEQUENTIAL,
    chance_mode=pyspiel.GameType.ChanceMode.DETERMINISTIC,
    information=pyspiel.GameType.Information.PERFECT_INFORMATION,
    utility=pyspiel.GameType.Utility.ZERO_SUM,
    reward_model=pyspiel.GameType.RewardModel.TERMINAL,
    max_num_players=_NUM_PLAYERS,
    min_num_players=_NUM_PLAYERS,
    provides_information_state_string=True,
    provides_information_state_tensor=True,
    provides_observation_string=True,
    provides_observation_tensor=True,
    provides_factored_observation_string=True)

_GAME_INFO = pyspiel.GameInfo(
    num_distinct_actions=_NB_POSSIBLE_ACTIONS,
    max_chance_outcomes=0,
    num_players=_NUM_PLAYERS,
    min_utility=-1.0,
    max_utility=1.0,
    utility_sum=0.0,
    max_game_length= _MAX_GAME_LENGTH) # If all the pieces are placed and a draw is reached

Game and State for the core game mechanisms

The two main classes that you need to extend and implement are pyspiel.Game and pyspiel.State. The former is essentially a factory class used to create Games. The latter is the class that actually retains all the information about the current playthrough, including who is the next player, what the state of the board is etc.

Game implementation

Implementing the Game class is rather straightforward. The constructor merely carries the game information contained in _GAME_TYPE to the parent class. Method new_initial_state creates a clean instance of the State class implementing the game, while method make_py_observer provides a view of the current game state using the Connect4_3D_Observer class.

class Connect4_3D_Game(pyspiel.Game):
    """A Python version of 3D connect 4."""

    def __init__(self, params=None):
        super().__init__(_GAME_TYPE,_GAME_INFO, params or dict())

    def new_initial_state(self):
        """Returns a state corresponding to the start of a game."""
        return Connect4_3D_State(self)
    
    def make_py_observer(self, iig_obs_type=None, params=None):
        """Returns an object used for observing game state."""
        if ((iig_obs_type is None) or \
             (iig_obs_type.public_info and not iig_obs_type.perfect_recall)):
            return Connect4_3D_Observer(params)
        else:
            return IIGObserverForPublicInfoGame(iig_obs_type, params)

State implementation

As stated above, the class which actually implements all the game mechanics is the one that extends State. Some notable methods that need to be implemented include:

  • current_player which returns the id of the player who needs to make an action next
    For the two-player game we are dealing with, the value alternates betwenn \(0\) and \(1\)
  • _legal_actions which defines which moves can be made by the current player
    In our case, there are 16 different moves. But when a pile of token is full, that move can no longer be played and is removed from the array returned by this method.
  • _apply_action which updates the state of the game based on the move chosen by the player
    This method updates the game board, checks the victory and end-game condition, and updates the current player
  • __str__ converts to state of the board to a string for easy display
    Although this method does not intervene in the game mechanics, it is needed when using existing programs from the Open Spiel examples that will display the current state of the board to the terminal.
  • is_terminal indicates if the game if over (either a player won or victory was reached)
  • returns when the game is over, indicates the payout to each player
  • _action_to_string converts the action picked by a player to an easy to read string
    Players pick a value in the range \(\left[ 0, 15 \right]\) to designate the pole onto which to place their token. This translates to (row, file, rank) coordinates on the board (the row is the first empty slot for the specified file and rank). I implemented it such that it shows the action number and the coordinates in one string: action(row,file,rank).

I will not go into further details in the implementation. The game itself is rather straightforward to implement following the example of the Python implementation of Tic-Tac-Toe. Suffice to say the most difficult part of the program is the check for victory conditions inside method apply_action. While not technically challenging, it is very easy to make a silly mistake here.

This is a problem because if the game is not correctly programmed, an AI will learn to play the “broken” game and make plays that do not make sense. We therefore need to test our game thoroughly …

Observer class

The last class that needs to be implemented is the “observer.” It is used to provide the observable information to the agents (the AIs) that are going to play the game. To be honest I do not fully understand what is been done here. I simply adapted the code from Tic-Tac-Toe to use a 3D array instead of a 2D array.

class Connect4_3D_Observer:
    """An observer that conforms to the PyObserver interface (observation.py)."""

    def __init__(self, params):
        """Initializes an empty observation tensor."""
        if params:
            raise ValueError(f"Observation parameters not supported; passed {params}")
        # The observation should contain a 1-D tensor in `self.tensor` and a 
        # dictionary of views onto the tensor, which may be of any shape.
        # Here the observation is indexed `(cell state, row, column)`.
        shape = (1+ _NUM_PLAYERS, _BOARD_RANKS, _BOARD_FILES, _BOARD_HEIGHT)
        self.tensor = np.zeros(np.prod(shape), np.float32)
        self.dict = {"observation": np.reshape(self.tensor, shape)}
        # TODO not certain the shape of the tensor is appropriate (why 3 times the board cells ?)

    def set_from(self, state, player):
        """Updates the `tensor` and `dict` to reflect the `state` from the PoV of `player`."""
        del player # Why? I guess the player is not needed in the implementation

        obs = self.dict["observation"]
        obs.fill(0)
        for row in range(_BOARD_HEIGHT):
            for file in range(_BOARD_FILES):
                for rank in range(_BOARD_RANKS):
                    cell_state = ".ox".index(state.board[row, file, rank]) # pick the corresponding character
                    obs[cell_state, row, file, rank] = 1 # this I don't understand
    
    def string_from(self, state, player):
        """Observation of `state` from the point of view of `player`, as a string."""
        del player
        return _board_to_string(state.board)

def _board_to_string(board):
    """Returns a string representation of the board."""
    slices = []
    b = np.flip(board,0)
    for i in range(b.shape[0]):
        for j in range(b.shape[1]):
            for k in range(b.shape[2]):
                slices.append(f"{b[i, j, k]}")
            slices.append("\n")
        slices.append("\n")
    return "".join(slices)

Game registration into Open Spiel

Finally, we need to register the newly implemented game into the list of games provided with Open Spiel so that we are able to launch it using its identifying string. This is done using the following line at the bottom of the program file connect_four_3d.py.

pyspiel.register_game(_GAME_TYPE, Connect4_3D_Game)

In file __init__.py the following import needs to be added:

diff --git a/open_spiel/python/games/__init__.py b/open_spiel/python/games/__init__.py
index 33945e06..a2d42e5b 100644
--- a/open_spiel/python/games/__init__.py
+++ b/open_spiel/python/games/__init__.py
@@ -28,6 +28,7 @@ pyspiel.register_game(_GAME_TYPE, KuhnPokerGame)
 
 from open_spiel.python.games import block_dominoes
 from open_spiel.python.games import chat_game
+from open_spiel.python.games import connect_four_3d
 from open_spiel.python.games import dynamic_routing
 from open_spiel.python.games import iterated_prisoners_dilemma
 from open_spiel.python.games import kuhn_poker

Now, any program which includes from open_spiel.python import games will know about our new game and we should be able to tell the programs to load our new game by giving it the appropriate arguments.

Testing

First playthroughs

Now that the game is implemented, we can play it through the Open Spiel framework. Random Agent vs Random Agent games can be played using the open_spiel/python/examples/example.py program mentionned above.

Here is the command and a randomly generated playthrough:

$ python open_spiel/python/examples/example.py --game_string="python_connect_four_3d"
Registered games:
[<GameType '2048'>, <GameType 'add_noise'>, <GameType 'amazons'>, <GameType 'backgammon'>, <GameType 'bargaining'>, <GameType 'battleship'>, <GameType 'blackjack'>, <GameType 'blotto'>, <GameType 'breakthrough'>, <GameType 'bridge'>, <GameType 'bridge_uncontested_bidding'>, <GameType 'cached_tree'>, <GameType 'catch'>, <GameType 'chat_game'>, <GameType 'checkers'>, <GameType 'chess'>, <GameType 'cliff_walking'>, <GameType 'clobber'>, <GameType 'coin_game'>, <GameType 'colored_trails'>, <GameType 'connect_four'>, <GameType 'coop_box_pushing'>, <GameType 'coop_to_1p'>, <GameType 'coordinated_mp'>, <GameType 'crazy_eights'>, <GameType 'cursor_go'>, <GameType 'dark_chess'>, <GameType 'dark_hex'>, <GameType 'dark_hex_ir'>, <GameType 'deep_sea'>, <GameType 'dots_and_boxes'>, <GameType 'dou_dizhu'>, <GameType 'efg_game'>, <GameType 'einstein_wurfelt_nicht'>, <GameType 'euchre'>, <GameType 'first_sealed_auction'>, <GameType 'gin_rummy'>, <GameType 'go'>, <GameType 'goofspiel'>, <GameType 'havannah'>, <GameType 'hearts'>, <GameType 'hex'>, <GameType 'kriegspiel'>, <GameType 'kuhn_poker'>, <GameType 'laser_tag'>, <GameType 'leduc_poker'>, <GameType 'lewis_signaling'>, <GameType 'liars_dice'>, <GameType 'liars_dice_ir'>, <GameType 'maedn'>, <GameType 'mancala'>, <GameType 'markov_soccer'>, <GameType 'matching_pennies_3p'>, <GameType 'matrix_bos'>, <GameType 'matrix_brps'>, <GameType 'matrix_cd'>, <GameType 'matrix_coordination'>, <GameType 'matrix_mp'>, <GameType 'matrix_pd'>, <GameType 'matrix_rps'>, <GameType 'matrix_rpsw'>, <GameType 'matrix_sh'>, <GameType 'matrix_shapleys_game'>, <GameType 'mfg_crowd_modelling'>, <GameType 'mfg_crowd_modelling_2d'>, <GameType 'mfg_dynamic_routing'>, <GameType 'mfg_garnet'>, <GameType 'misere'>, <GameType 'mnk'>, <GameType 'morpion_solitaire'>, <GameType 'negotiation'>, <GameType 'nfg_game'>, <GameType 'nim'>, <GameType 'nine_mens_morris'>, <GameType 'normal_form_extensive_game'>, <GameType 'oh_hell'>, <GameType 'oshi_zumo'>, <GameType 'othello'>, <GameType 'oware'>, <GameType 'pathfinding'>, <GameType 'pentago'>, <GameType 'phantom_go'>, <GameType 'phantom_ttt'>, <GameType 'phantom_ttt_ir'>, <GameType 'pig'>, <GameType 'python_block_dominoes'>, <GameType 'python_connect_four_3d'>, <GameType 'python_dynamic_routing'>, <GameType 'python_iterated_prisoners_dilemma'>, <GameType 'python_kuhn_poker'>, <GameType 'python_liars_poker'>, <GameType 'python_team_dominoes'>, <GameType 'python_tic_tac_toe'>, <GameType 'quoridor'>, <GameType 'rbc'>, <GameType 'repeated_game'>, <GameType 'restricted_nash_response'>, <GameType 'sheriff'>, <GameType 'skat'>, <GameType 'solitaire'>, <GameType 'spades'>, <GameType 'start_at'>, <GameType 'stones_and_gems'>, <GameType 'tarok'>, <GameType 'tic_tac_toe'>, <GameType 'tiny_bridge_2p'>, <GameType 'tiny_bridge_4p'>, <GameType 'tiny_hanabi'>, <GameType 'trade_comm'>, <GameType 'turn_based_simultaneous_game'>, <GameType 'twixt'>, <GameType 'ultimate_tic_tac_toe'>, <GameType 'y'>, <GameType 'zerosum'>]
Creating game: python_connect_four_3d
....
....
....
....

....
....
....
....

....
....
....
....

....
....
....
....


Player  0 , randomly sampled action:  7(0,1,3)
....
....
....
....

....
....
....
....

....
....
....
....

....
...o
....
....


Player  1 , randomly sampled action:  8(0,2,0)
....
....
....
....

....
....
....
....

....
....
....
....

....
...o
x...
....

<-- a number of plays removed for brevity -->

Player  0 , randomly sampled action:  5(2,1,1)
o...
....
....
....

x...
.o..
....
....

o..x
.x..
oo..
o...

x.ox
xx.o
xxx.
oo.o


Player  1 , randomly sampled action:  11(0,2,3)
o...
....
....
....

x...
.o..
....
....

o..x
.x..
oo..
o...

x.ox
xx.o
xxxx
oo.o


Utility for player 0 is -1.0
Utility for player 1 is 1.0

After many iteration, Player 1 was able to align four tokens on the bottom row, winning the game!

It turns out that by default, most of the programs provided in the open_spiel/python/examples directory do not have the import directive that would include the games implemented in Python. To be able to launch my newly implemented game, I had to add the following import to the example program open_spiel/python/examples/mcts.py.

+ from open_spiel.python import games
  from open_spiel.python.algorithms import mcts

To launch a game of Connect Four 3D, I can then use the following command. Note that the game flag is set to python_connect_four_3d, matching the value for the "short_name" in the GameType definition.

python open_spiel/python/examples.mcts.py --game="python_connect_four_3d" --player1="human" --player2="random"

It is also possible to flip the players and make the human play second:

python open_spiel/python/examples.mcts.py --game="python_connect_four_3d" --player1="random" --player2="human"

Either way, winning against an agent that plays at random should not be too difficult.

Testing the victory conditions

If you built the Open Spiel from source, you will have noticed that it contains a fairly large number of tests. Since it is not humanly feasible to test every victory situation (the part of the implementation I am most worried about), I decided to follow the example and implement a few basic tests.

The tests I implemented can be found in connect_four_3d_test.py. They are essentially inspired by the test suite for the Tic-Tac-Toe game.

I also tried to implement almost all possible winning combinations for the game. This is not a thorough test suite (in all of my tests Player 0 is the winning player), but this gives me some confidence that the victory detection code is implemented correctly.

Summary

We implemented a game in Python into the Open Spiel framework, tested it, and we are now able to run it using the example main programs provided with the library. The next step will be to start training some AI on this game, but that will have to wait for a follow-up …




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Reinforcement Learning with Open Spiel (Part 2)
  • gnuplot
  • HanDist
  • MPI-JUnit
  • Tuning mechanism for the Lifeline GLB