反对“2024年无需搜索的大师级国际象棋”
Contra "Grandmaster-level chess without search" (2024)

原始链接: https://cosmo.tardis.ac/files/2024-02-13-searchless.html

## DeepMind 的象棋 Transformer:摘要 DeepMind 最近发表了一篇论文,详细介绍了一种训练来下象棋的 Transformer 模型,模仿了强大的引擎 Stockfish 16。该模型预测状态的价值、采取行动的价值以及可能走法的概率分布——类似于 AlphaZero 的架构,但增加了一个行动价值预测。 作者声称达到了大师级水平,证据是 Lichess Blitz 评分为 2895。然而,该论文的新颖性受到质疑,因为像 Leela Chess Zero (Lc0) 这样的开源项目已经显著超越了 AlphaZero 的表现。Lc0 的当前网络可能实现了与 DeepMind 模型相当或更高的 Elo 评分,甚至*没有*价值头。 关键在于,该论文的分析被认为很薄弱,作者似乎暗示他们的模型*优于*其训练数据(Stockfish),并依赖于人类大师的意见——比 Stockfish 弱得多的棋手——来解决差异。该研究很大程度上忽略了 Lc0 社区所做的重大进展,引发了对其整体严谨性和对该领域的贡献的担忧。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Contra “无需搜索的大师级国际象棋” (2024) (tardis.ac) 7 分,luu 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Google DeepMind recently published Amortized Planning with Large-Scale Transformers: A Case Study on Chess, wherein they present a transformer-based model for playing chess, trained on the strongest chess engine, StockfishSpecifically, Stockfish 16, which has since been surpassed by newer versions of Stockfish. . This model takes a game-state as input, and learns to output three separate quantities:

  • , the value of the state as determined by a 50ms Stockfish search
  • , the value of taking action in state , as determined by a 50ms Stockfish search
  • , a probability distribution over actions in state , attempting to match the choice of 50ms-SF

Once trained, this model can be used to play chess by taking in a given state.

Notably, this is extremely similar to the model used in AlphaZero, DeepMind's general game playing algorithm, the design of which has since been replicated in strong open-source game playing programs like KataGo and Leela Chess Zero. AZ-style networks predict policy and value only, so only the output is new.

Indeed, in the paper they compare the model's playing strength to AlphaZero, either using the AZ model's policy (taking the move with the highest probability), or using the model's value estimate, by explicitly doing depth-one rollout of the legal moves in the position and taking the move that maximises the value of the resulting state (technically a search process, but the authors don't mark it as one):

searchless-tournament-scores

The big claim the authors make here is that their model plays chess at the level of a human Grandmaster, and claim to demonstrate this by the fact that the model reached a Lichess Blitz rating of 2895. This is an impressive rating, and frankly it seems pretty hard to argue that it could be achieved by any human of less-than-Grandmaster strength. One potential caveat is that the model cannot make use of extra time to "think harder", so its Elo will decrease as time controls increase, meaning that even if it is “Grandmaster strength” in Blitz, it may be sub-Grandmaster in “real time controls”.

The biggest problem with this whole paper, in my opinion, is that it may not be a new result at all! The open-source Leela Chess Zero project have made massive improvements over AlphaZero, and their networks' raw policies are far stronger than AZ's - a test of BT4, the currently-strongest Lc0 network, versus T30/T40, the Lc0 nets that are closest in strength to AlphaZero, yields +547 Elo over T40, and +628 Elo over T30. These sorts of numbers catapult Lc0 right up to the same level as DeepMind's 136M Transformer (1620+600=2220) - and this is Policy Elo! The AZ nets get +230 more Elo from using the value head, so one could reasonably expect that BT4-Value would score something like 2400 in this paper's Tournament Elo.

They later analyse some of the games that their model played - the writing here is strange, as the authors appear ready to believe that the model they have trained to mimic Stockfish may have outperformed Stockfish... somehow? (a model outperforming its training data isn't actually so totally far-fetched as I make it sound, but the openness the authors express here is still totally unlicensed)

searchless-hubris

Particularly egregious is that they then elect to resolve this difference in opinion by appeal to human masters, who are hundreds of elo weaker than Stockfish!

In conclusion, this doesn't seem like a very serious paper, and it almost seems to make a point of ignoring literally the most significant piece of existing work in the area.

a comparison of “searchless” (1-ply value head maximisation) puzzle-solving ability across AlphaZero, the paper's model, and Lc0

searchless-puzzle-scores

联系我们 contact @ memedata.com