Islo.dev 上的简单元框架

Islo.dev 上的简单元框架
Simple Meta-Harness on Islo.dev

原始链接: https://zozo123.github.io/meta-harness-on-islo-page/

Meta-harness，一个用于代理评估的框架，需要可复现的环境、大规模并行以及持久化追踪——这些能力均能通过Islo快照完美解决。Islo允许保存和恢复完整的运行时环境，确保在众多候选者和任务中进行一致的测试。该系统利用Islo运行代理模拟（目前是一个有bug的Python脚本），针对定义的任务记录输出以供分析。一个“提议者”脚本随后识别失败点并生成改进的系统提示，从而有效地驱动代理学习。至关重要的是，这个完整的循环——从执行到分析和提示优化——可以使用像Claude这样的真实模型通过Islo进行复制，保持相同的输入/输出契约。Islo的网关和源代码克隆功能进一步增强了安全性和工作负载管理，使其成为通过Harbor框架进行强大代理评估和强化学习环境的无缝且强大的解决方案。

黑客新闻新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Islo.dev 上的简单元 Harness (zozo123.github.io) 12 分，由 zozo123-IB 发表于 33 分钟前 | 隐藏 | 过去的 | 收藏 | 讨论帮助考虑申请 YC 2026 年夏季项目！申请截止至 5 月 4 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式搜索：

Why Islo snapshots are the missing primitive

Three things meta-harness needs from its runtime:

Reproducible eval environments — every candidate harness runs against the same setup, otherwise the score is noise.
Massive parallelism — testing $N$ candidates $\times K$ tasks adds up fast.
Persistent traces — the proposer needs to read stdout/stderr/agent-thoughts from runs that completed an hour ago.

Islo’s primitives map 1:1:

islo snapshot save meta-base                  
islo use mh-cand-7 --snapshot meta-base ...   
islo logs mh-cand-7 --type agent

Add islo gateway (deny-by-default egress to prevent reward-hacking) and --source github://owner/repo (clone the workload at boot), and the wiring is basically free. Harbor — Islo Labs’ framework for agent evaluations and RL environments — slots in as the workload spec.

The POC

tasks/        
harness/v0/   
bin/
  meta-harness    
  agent-sim.py    
  proposer.py     
viz/index.html    
runs/

The agent is a Python simulator that’s intentionally buggy — until the system prompt contains the right hint keyword. The loop is therefore deterministic and offline, runs in seconds, but the wiring is identical to what you’d ship against real Claude on Islo. The proposer is 80 lines: read runs/iter-N/, find which tasks failed, look up the missing hint for that task, append it to a new harness/v{N+1}/system.md. A real proposer would be:

islo use --snapshot meta-base --agent claude --task "
  Examine /workspace/runs/iter-${N}. Find a common failure mode in the
  grade.sh stderr. Write /workspace/harness/v${N+1}/system.md as a small
  edit on top of v${N}/system.md to fix it."

Same input, same output contract. The orchestrator already has the stub.