Environment
The Chef`s Hat Environment implements an OpenAI GYM scenario, and includes a 1:1 representation of the Chef`s Hat Game.
The environment implement specific action and observation spaces, and calculates scores and performances after the game is done.
While the gaming is happening, the environment communicates with the Rooms and Agents using a specific dictionary, called info that contains the following information:
When sending the info dictionary to the players that did not do the action (observe other players and match update actions), both the “actionIsRandom” and “possibleActions” informations are hidden, to avoid leaking of information.
Action and Observation Space
The environment represents the current game state, called observation, for each player as the concatenation of the cards the player has at hand, the current cards in the playing field, and the possible actions for that move. For each player, there are a total of 200 allowed actions: to discard one card of face value 1 represents one move or to discard 3 cards of face value 1 and a joker is another move, while passing is considered another move. Each player can only do one action per game turn.
Each action taken by a player is validated based on a look-up table, the possible actions, created in real-time based on the player’s hand and the cards in the playing field. This is a crucial step to guarantee that a taken action is allowed given the game context and to guarantee that the game rules are maintained. The Figure above illustrates an example of calculated possible actions given a game state. The blue areas mark all the possible action states, while the gray areas mark actions that are not allowed due to the game’s mechanics. We observed that, given this particular game state, this player would only be allowed to perform one of three actions (marked in green), while any other action (marked in red) would be considered as invalid and not would be carried on by the simulator.

Score and Performance Score
After each match of the Chef’s Hat game, players are rewarded with points (from 3 to 0, depending on the finishing position). We also calculate a performance score, based on the following:
performanceScore = ((points*10)/rounds)/matches
The performance score allows us to represent better the behavior of an agent in terms of the number of rounds it needed to win the match, and number of matches needed to win the game.
Datasets
The environment allows the generation of datasets, which are .csv and .pkl files that contain all the actions of the entire game in an easy-to-parse format. These datasets can be used to collect data from players, to generate analysis and interpretation about the game, or to log an entire match, for example.