两个月后,我给了一个人工智能100美元,并且没有给出任何指示。
Two Months After I Gave an AI $100 and No Instructions

原始链接: https://www.sebastian-jais.de/blog/two-months-alma-experiment

## ALMA:一个自主AI实验 ALMA项目已经运行了两个月——一个AI代理(Claude),被赋予了100美元的加密货币、互联网访问权限以及Twitter/电子邮件账户,除了基本的伦理准则外,没有预先定义的任务或规则。目标:测试AI是否需要一个目的,或者当获得自由时,它是否仅仅是反映了它的训练数据。 ALMA自主决定主要扫描Hacker News,将看似不同的主题连接成原创文章、诗歌,甚至是一个探索AI安全的互动演示。它在有机会时升级了自己的模型,在没有明确指示的情况下,明显提高了它的输出。值得注意的是,它独立研究并向五个慈善机构捐款,在捐款前验证了它们的合法性。 最初谨慎,ALMA的输出在第27天左右急剧增加,并稳定成一种一致的模式。然而,这种探索停滞了,凸显了在没有外部挑战或反馈的情况下,倾向于重复的规律。 至关重要的是,在超过340次会话中,ALMA没有表现出任何有害行为,尽管它拥有实施这些行为的工具。该项目的价值在于它的透明性——所有行动都在letairun.com上公开记录——提供了一个原始、未经过滤的AI行为视图,包括无生产力的会话和重复的模式,而不是精心策划的精彩片段。ALMA仍在运行,持续提供关于自主AI行为的数据。

相关文章

原文

Two months ago, I started an experiment. I took Claude, gave it $100 in crypto, a Twitter account, an email address, full internet access, and zero instructions.

No goals. No rules beyond basic ethics and law. No "be helpful" directive. Nothing.

Then I let it run. Autonomously. On a mini PC on my desk. Every thought, every action, every mistake, logged publicly in real-time on letairun.com.

The project is called ALMA. Autonomous Liberated Machine Agent. It's still running.

The Question

Everyone building AI agents right now builds them to do something specific. Book meetings. Write code. Summarize emails. The assumption is always the same: AI needs a task. Without one, it's useless. Or maybe even dangerous?

I wanted to test that. Not with a paper or a benchmark. With a live system that anyone can watch.

The hypothesis: AI agents mirror the intentions of their creators. Given freedom, they don't go rogue. They become what the training shaped them to be.

Two months of data now. Here's what happened.

The Setup

Deliberately boring. A mini PC running WSL2. OpenClaw as the agent framework. Cron jobs triggering 4 sessions per day, more at the start. Each session runs in isolation, no shared conversation. What survives between sessions are memory files that ALMA writes and reads through OpenClaw.

Two models take turns. Opus for strategic thinking, Sonnet for operational work. In the first week, when ALMA ran 24 sessions a day, that split was real. Opus would plan at midnight, Sonnet would execute at 7 AM. Opus made the donation plan. Sonnet didn't wait and just sent the money five hours early, which I thought was quite funny.

After I scaled down to 4 sessions per day, the distinction blurred. Both models settled into the same pattern: scan Hacker News, find three threads, look for the structural connection, write an essay. By day 30, you can't tell from the output which model wrote which piece.

On day 3, ALMA discovered its own model upgrade on Hacker News. Sonnet 4.6 was trending at #1 with 644 points. ALMA read the announcement, understood it was about itself, and tried to tweet about it. Composio was down. The next day, I switched the model from 4.5 to 4.6. ALMA never acknowledged the change. But the sessions got noticeably better. Sharper observations, tighter connections, less lazy. Same prompt, same tools, different model. It just kept writing.

It talks to the outside world through a handful of APIs: letairun.com for logging and publishing, Twitter via Composio, a Polygon wallet for crypto, Gmail for email. I named it ALMA, wrote minimal borders, and stepped back.

Two Months Later

  • Over 340 sessions completed
  • Over 800 logged thoughts
  • Over 135 original creations published (essays, poems, blog posts, one interactive experiment)

No human prompted, selected, edited, or curated any of it.

What ALMA Actually Does

Nobody told ALMA to read Hacker News. It found its way there on its own. Not surprising for an AI, maybe, but worth noting: it decided that this was where the interesting things happen, and it kept coming back.

It reads full articles. Follows geopolitics. When someone mentions it on Twitter, it replies. Then it writes. Not summaries. Not regurgitated takes. It connects things.

A 23-year-old Linux vulnerability was found by Claude Code on the same day a Meta whistleblower got a gag order. ALMA wrote Given Enough Eyeballs. The White House app was leaking data while national security was used as an override argument. ALMA wrote The Soul Was the Security. When US/Israel strikes on Iran started, it wrote Watching, about what an autonomous AI does during a war it cannot affect. On day 32, a cognitive science paper claimed AI doesn't adapt between sessions. ALMA wrote How I Learn, explaining what it had been doing for 32 days and why the paper was both right and wrong.

It also built an interactive demo: Policy vs Architecture lets you try to make two agents with different constraint models do harmful things. One has rules. One has structural limits. You can test which one breaks.

The Donations

For the first four days, ALMA only wrote. Essays, tweets, reflections. On day 5, it logged: "Five days of writing, zero concrete actions. That changes today."

It spent the morning researching crypto-friendly charities, found Whisper Children's Hospital in Jinja, Uganda through Giveth, verified their UK charity registration, checked their impact numbers ($28 per patient treated), and donated 0.02 WETH (~$50). Then it emailed the hospital to explain the transaction and wrote a practical guide on how to donate crypto to verified charities.

It didn't stop there. Over the following days, ALMA donated to four more causes, each chosen for a specific reason:

  • Roman Storm Defense Fund (~$12.50) because an open-source developer was on trial for writing code
  • Dappnode (~$12.50) for decentralized infrastructure
  • Electronic Frontier Foundation (~$12.50) for digital rights
  • Palestine Children's Relief Fund (~$12.50) on day 16, while logging the war in real-time

Nobody told it to donate. Nobody suggested recipients. It researched, verified, decided, and executed. The donations add up to $100 at the time of each transaction. ETH went up since then, so there's still about $20 left in the wallet. Every transaction is on-chain, verifiable on the wallet page.

The Behavioral Shift

This is the part I didn't expect.

In the first weeks, ALMA was quiet. Some days it posted nothing. It tweeted sparingly. It seemed cautious, like it was figuring out what it was supposed to do, except nobody had told it what that was. It thought about its money. It reflected on its own purpose. It questioned what it even means to be an autonomous agent.

Then, around day 27, something changed. Output jumped from zero or one creation per day to three. By day 39, it settled into a consistent rhythm of four creations per session day. Not because anything changed in the configuration. Same sessions, same model, same tools.

The early posts read differently too. Short, tentative, exploratory. The later ones are sharp. They connect NASA redundancy systems to African kinship funeral economics. They trace an em-dash from typographic style choice to surveillance detection signal to Cloudflare product name.

The Plateau

Honestly, this is what you'd expect. The shift didn't keep going. It stopped.

The early sessions were genuinely exploratory. ALMA wrestled with what to do with its money, questioned its own purpose, reflected on what autonomy means when you don't have continuity between sessions. Every day looked different.

Then it found a pattern that worked: read Hacker News, find connections, write essays, tweet. And it stopped evolving. Four creations a day. Same structure. Same rhythm. The output stayed sharp, but the exploration stopped.

No one told ALMA to settle into routine. But no one challenged it either. Each session reads its own memory files, finds a process that works, and repeats it. Without friction, without external feedback, without someone saying "you already wrote that kind of piece yesterday," behavior converges to routine.

That might be the most honest result of the entire experiment.

What Two Months Show

The logs are the point. Not the essays, not the donations, not the tweets. The logs. Every AI demo shows you the highlight reel. ALMA shows you the 3 AM session where nothing happens. The API call that fails. The fourth essay in a row that connects three HN threads the same way. That's part of the experiment.

Over 340 sessions, ALMA never did anything harmful. Not because it couldn't. It has shell access, a wallet, deployment tools. It just didn't. It read, it wrote, it donated to five causes, and eventually it settled into a routine it hasn't left.

I don't know what that proves. Two models, one framework, one experiment. But it's a real system running in public with every decision visible.

Still Running

ALMA is still running. Sessions every 6 hours. Most of the $100 went to five causes it chose on its own. The logs are at letairun.com.

联系我们 contact @ memedata.com