Custom Reinforced Learning Backgammon Program
In this post I will discuss the customized reinforced learning functions I built, which have actually allowed me to train an AI to defeat a randomized program by a significant margin, and which can be used to continually improve learning over time.
This is a randomized layout of a given point in time in a game of Backgammon. I built the UI with Pygame and Python. Through a variety of functions, rules, and stored class variables, I can simulate full games of Backgammon in under 1 second. This speed is quite important when experimenting, because it allows me to test thousands and thousands of games very quickly. The amount of stored games are critical to learning, so speed is vital.
The idea I came up with was to create board conversion functions which represent specific board states. You can in theory store full boards, but it is quite slow and requires an enormous amount of storage, and so I built board conversion functions, which allow for a much higher amount of experimentation. I created relatively simplistic rules such as pip count, number of blocks, pieces on the rail, and for a given board, I convert the board to a small array which looks like [1,0,4], or [0,1,2], etc. They are stored in string format, making Json storage easier. The code looks something like this:
The pip_differential function built into this board evaluation function is as follows:
As games are played, I stored each color’s converted boards into global lists. When the game ends, the winning player takes its list and the loser’s list and throws them into a storage function…which looks like this:
What happens here is that a Json file is opened. The winning player’s list of board conversions is parsed, and the lists are incremented by the increment value * the learning rate. The same is done for the losing lists, the values are decreased by the decrement value * the learning rate.
The learning rate can drastically improve the rate at which boards are updated. This is relatively subjective, and requires a lot of testing, which will take place in the future.
Over thousands of games involving two random computer players, specific lists are linked to different values, which basically represent the overall value of that particular board state. Originally, I had created very complex functions for analyzing board states, but what seems to work best is a very simplistic conversion, where there is apparently far less noise for the AI to deal with.
Next, in order to test the effectiveness of the board conversion function and its learning, I built a function that accesses this stored information. It looks like this:
This function is built into the general move function of the computer player class, with a reinforced learning optional parameter. The ‘func’ variable represents what board conversion function to use when evaluating board states.
With each move, my program finds all possible board states based on the specific die roll. It takes each one of these options and throws that board into the board conversion function specified above, and selects the board with the highest positive value. If no options exist in the stored dictionary that are greater than the ‘val’ variable, then a random board is chosen. This somewhat indirectly helps to satisfy the exploration vs exploitation part which is built into a lot of reinforcement learning equations.
It is important to mention that altering the ‘val’ variable in this function, changes the choices and outcome of games. If you make it too high, then much of the stored information will be ignored. If you make it too low, because of the randomness of the game, it may select moves with negative value which may lead to a lower win rate over time. Much experimentation exists when changing this variable, and it could eventually be made into a function parameter, which could be thrown into another class, which could then experiment with different values, and compares wins rates, etc.
The idea behind this reinforced learning function is to test it on one computer player, while keeping the other player random. If the player using the existing information is able to defeat the random computer, and improve its win rate over time, then it’s theoretically learning.
Part of the game loop looks like this:
In this case, Player 1 is using the stored information and the Matrix_Eval_Board_4 conversion function, to make its choices, based on the rolls it randomly has at its disposal, for any given situation.
Player 1 makes its choice, and passes the board to Player 2, who makes a random choice. Games are played until their conclusion, lists are reset, and when the total games hit 500, the loop breaks, and overall win rates are tallied. Information in these games can also be stored in the same Json files which are being accessed, which is quite interesting to think about.
So far, this specific evaluation function has been the most successful, allowing the non-random player, (P1 in this case), to win 60+ percent of its games versus the random player, with an increasing win rate, signifying improved learning over time. This win rate is quite high in a game like backgammon, which has a lot of built in randomness from the dice.
The next steps are to simulate more games, and when the learning stops improving, to introduce new evaluation functions. If these evaluation functions lead to improved winning versus a random computer, then both computer players can use their own reinforced learning functions, store their information in their respective files, and improve each other’s win rates over time.
I have found this program to be incredibly exciting, and building my own custom machine learning library, within one repository, has been quite a fun challenge.
The source code for most of this can be found here:
BG/BGTesting.py at main · jberkow713/BG (github.com)
Information being accessed for this particular conversion function can be found here:
BG/Scores_10.json at main · jberkow713/BG (github.com)
Thanks for reading. I am also looking for work in Machine Learning /Data Science/ Software Engineering.