原文
[Submitted on 2 Mar 2025 (v1), last revised 5 Mar 2025 (this version, v3)]
View a PDF of the paper titled LADDER: Self-Improving LLMs Through Recursive Problem Decomposition, by Toby Simonds and 1 other authors
View PDF HTML (experimental)Abstract:We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.
From: Akira Yoshiyama [view email]
[v1] Sun, 2 Mar 2025 05:16:43 UTC (286 KB)
[v2] Tue, 4 Mar 2025 14:30:32 UTC (203 KB)
[v3] Wed, 5 Mar 2025 11:50:24 UTC (203 KB)
[v1] Sun, 2 Mar 2025 05:16:43 UTC (286 KB)
[v2] Tue, 4 Mar 2025 14:30:32 UTC (203 KB)
[v3] Wed, 5 Mar 2025 11:50:24 UTC (203 KB)