在最先进的法学硕士中展示推理失败的简单任务

在最先进的法学硕士中展示推理失败的简单任务
Simple tasks showing reasoning breakdown in state-of-the-art LLMs

原始链接: https://arxiv.org/abs/2406.02061

arXivLabs 是一个框架，允许合作者直接在我们的网站上开发和共享新的 arXiv 功能。与 arXivLabs 合作的个人和组织都接受并接受了我们开放、社区、卓越和用户数据隐私的价值观。 arXiv 致力于这些价值观，并且只与遵守这些价值观的合作伙伴合作。您有一个能为 arXiv 社区增加价值的项目想法吗？了解有关 arXivLabs 的更多信息。

谜语“爱丽丝有 4 个兄弟，她还有 1 个妹妹。爱丽丝的哥哥有多少个姐妹？” 引起了人们对清晰度和精确度的担忧。这个问题没有关注爱丽丝，而是挑出了一个特定的兄弟，引入了不必要的歧义。建议的修改是问“爱丽丝的兄弟有多少个姐妹？” 或“爱丽丝的每个兄弟都有多少个姐妹？” 以避免混淆。此修订在提供谜语所需的地方保持了模糊性，但在其他地方提供了精确性，以尽量减少对解算者的不必要的干扰。这些大型语言模型 (LLM) 正在根据提供的上下文学习推理，但它们有效处理模糊情况的能力各不相同。通过澄清谜题，我们的目标是提高他们回答的准确性。

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

在最先进的法学硕士中展示推理失败的简单任务 Simple tasks showing reasoning breakdown in state-of-the-art LLMs

在最先进的法学硕士中展示推理失败的简单任务
Simple tasks showing reasoning breakdown in state-of-the-art LLMs