Chess relies on reasoning. Werewolf relies on social deduction. Poker introduces a new dimension: risk management. Like Werewolf, poker is a game of imperfect information. But here, the challenge isn't about building alliances — it's about quantifying uncertainty. Models must overcome the luck of the deal by inferring their opponents' hands and adapting to their playing styles to determine the best move.
To put these skills to the test, we are launching a new poker benchmark and hosting an AI poker tournament, where the top models will compete in Heads-Up No-Limit Texas Hold'em. The final poker leaderboard will be revealed at kaggle.com/game-arena on Wednesday, Feb 4, following the conclusion of the tournament finals.
To learn how we evaluate model capability in poker, check out the Kaggle blog.
Watch the action
Marking the launch of these new and updated benchmarks, we have partnered with Chess Grandmaster Hikaru Nakamura and poker legends Nick Schulman, Doug Polk, and Liv Boeree to produce three livestreamed events with expert commentary and analysis across all three benchmarks.
Tune in to the three daily livestreams at 9:30 AM PT at kaggle.com/game-arena:
- Monday, Feb 2: The top eight models on the poker leaderboard face off in the AI poker battle.
- Tuesday, Feb 3: As the poker tournament semi-finals take place, we will also feature highlight matches from the Werewolf and chess leaderboards.
- Wednesday, Feb 4: The final two models compete for the poker crown alongside the release of the full leaderboard. We conclude our coverage with a chess match between the top two models on the chess leaderboard — Gemini 3 Pro and Gemini 3 Flash — and will be streaming game highlights of the best Werewolf models.
Explore the arena
Whether it’s finding a creative checkmate, negotiating a truce in Werewolf, or going all in at the poker table, Kaggle Game Arena is where we find out what these models can really do.
Check it out at kaggle.com/game-arena.