展示HN：开源的测试时扩散实现，可在24gb GPU上运行。

展示HN：开源的测试时扩散实现，可在24gb GPU上运行。
Show HN: OSS implementation of Test Time Diffusion that runs on a 24gb GPU

原始链接: https://github.com/eamag/MMU-RAG-competition

## TTD-RAG：为MMU-RAG竞赛设计的深度研究代理 TTD-RAG是一个为MMU-RAG竞赛构建的研究代理，基于“测试时扩散的深度研究者 (TTD-DR)”框架。它通过将报告生成构建为迭代的“去噪”过程来处理复杂的推理任务。系统从一个草稿开始，通过重复的信息检索、综合和修改循环来完善它。主要特点包括**报告级去噪**——使用不断演变的草稿来指导搜索——以及**组件式自进化**，以改进规划和综合。它利用vLLM高效地提供Qwen模型（生成和重新排序），并利用FineWeb Search API获取外部知识。该系统完全符合竞赛要求，支持动态（流式）和静态评估端点。它使用Docker容器化并通过FastAPI部署，需要配备24GB+ VRAM的NVIDIA GPU。还提供了使用AWS CLI的提交说明。通过`local_test.py`脚本可以方便地验证端点功能。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录展示 HN：一个在 24gb GPU 上运行的测试时间扩散的开源实现 (github.com/eamag) 5 分，由 eamag 2 小时前发布 | 隐藏 | 过去 | 收藏 | 讨论考虑申请 YC 的 2026 年冬季批次！申请截止日期为 11 月 10 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式搜索：

原文

This repository contains our submission for the MMU-RAG Competition, a deep research agent named TTD-RAG. Our system is a faithful implementation of the framework proposed in the paper "Deep Researcher with Test-Time Diffusion (TTD-DR)". This README is generated by gemini 2.5.

It conceptualizes report generation as an iterative "denoising" process, starting with a preliminary draft and progressively refining it through cycles of targeted search, synthesis, and revision. This approach is designed to excel at complex, multi-hop reasoning tasks that require coherent, long-form answers.

Test-Time Diffusion Framework: Models research report generation as an iterative process of refining a "noisy" draft with external information, ensuring coherence and reducing information loss.
Report-Level Denoising with Retrieval: Uses an evolving draft to dynamically guide the search process, ensuring each retrieval step is targeted at filling specific knowledge gaps.
Component-wise Self-Evolution: Enhances the quality of each step in the workflow (planning, synthesis) by generating diverse variants, critiquing them, and merging them into a superior output.
High-Performance Serving: Utilizes vLLM to serve both the generative (Qwen/Qwen3-4B-Instruct-2507) and reranking (tomaarsen/Qwen3-Reranker-0.6B-seq-cls) models for high throughput and low latency.
Competition Compliant: Fully supports both dynamic (streaming) and static evaluation endpoints as required by the competition rules, validated with the provided local_test.py script.

⚙️ System Architecture & Workflow

The agent operates in a structured, multi-stage process orchestrated by src/pipeline.py:

Stage 1: Planning & Initial Drafting
- An initial Research Plan is generated to outline the key areas of investigation.
- A preliminary Noisy Draft is created based on the LLM's internal knowledge, serving as the starting point for the diffusion process.
Stage 2: Iterative Search & Denoising
- The system enters a loop, where for each iteration:
  1. A new search query is generated, informed by the current draft's deficiencies and the overall plan.
  2. Documents are retrieved from the FineWeb Search API.
  3. The retrieved documents are chunked and reranked using a specialized model to find the most relevant information.
  4. The top-ranked chunks are synthesized into a concise answer for the search query.
  5. The draft is revised ("denoised") by integrating this new information.
Stage 3: Final Report Generation
- After the iterations complete, the agent synthesizes the final, refined draft, the initial plan, and the full history of questions and answers into a single, comprehensive report.

Backend Framework: FastAPI
LLM Serving: vLLM
Generative LLM: Qwen/Qwen3-4B-Instruct-2507
Reranker Model: tomaarsen/Qwen3-Reranker-0.6B-seq-cls
Retrieval Source: FineWeb Search API
Containerization: Docker

Docker and Docker Compose
An NVIDIA GPU with 24GB+ VRAM
NVIDIA Container Toolkit

First, create a local environment file from the example template. This file will store your API keys.

Now, open .env and add your API keys for:

FINEWEB_API_KEY
OPENROUTER_API_KEY (used as a fallback for the generator)

2. Build and Run the Container

We recommend using Docker Compose, which handles building the image and running the services as defined in compose.yml.

docker compose up --build

This command will:

Build the Docker image from the Dockerfile.
Start the container.
Execute the start.sh script, which first launches the vLLM OpenAI-compatible server in the background to serve the Qwen models.
After a brief pause to allow the models to load, it starts the FastAPI application on port 5053.

Your API is now running and accessible at http://localhost:5053.

✅ Testing Your Implementation

You can verify that your service is compliant with the competition requirements using the provided local_test.py script.

uv sync
source venv/bin/activate

# Test both the /run and /evaluate endpoints (full test)
python local_test.py --base-url http://localhost:5053

# Test only the dynamic /run endpoint
python local_test.py --base-url http://localhost:5053 --test-mode run

# Test only the static /evaluate endpoint
python local_test.py --base-url http://localhost:5053 --test-mode evaluate

A successful run will confirm that both endpoints are functioning correctly and that the result.jsonl file is generated as expected for the static evaluation.

Health Check: GET /health
- A simple endpoint to confirm the service is running. Returns {"status": "ok"}.
Dynamic Evaluation: POST /run
- Input: {"question": "string"}
- Output: A Server-Sent Events (SSE) stream that provides real-time updates on the agent's progress, including intermediate steps, citations, and the final report.
Static Evaluation: POST /evaluate
- Input: {"query": "string", "iid": "string"}
- Output: A single JSON response {"query_id": "string", "generated_response": "string"}.

🚢 Competition Submission

The following AWS CLI commands are provided for pushing your final Docker image to the competition's ECR repository.

Sign in to AWS ECR

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <your-aws-account-id>.dkr.ecr.us-east-1.amazonaws.com

Build the Image (if not already built) Ensure you build for the correct platform.
```
docker build --platform linux/amd64 -t ttt-dr:latest .
```

Tag the Image for ECR

docker tag ttt-dr:latest <your-aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/neurips2025text/ttt-dr:latest

Push the Image to ECR

docker push <your-aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/neurips2025text/ttt-dr:latest