我使用 Claude Code 对我的核磁共振成像（MRI）结果进行了二次咨询。

我使用 Claude Code 对我的核磁共振成像（MRI）结果进行了二次咨询。
I used Claude Code to get a second opinion on my MRI

原始链接: https://antoine.fi/mri-analysis-using-claude-code-opus

作者分享了其在被诊断为肩胛下肌三级撕裂后，利用人工智能（Claude Code 运行的 Opus 4.8）分析 MRI 扫描结果的经历。由于怀疑诊所激进的治疗方案（包括存疑的冲击波疗法和顺势疗法注射），作者将 DICOM 格式的 MRI 文件交由 AI 处理。起初，AI 报告肌腱完好，这与诊所的结论相悖。在通过多个 AI 子代理对比临床记录和肢体活动测试进行第二次更系统的分析后，AI 得出结论：仅存在“轻度止点肌腱病”，并无明显的撕裂。作者强调自己并非医疗专业人士，并提醒该实验仅供参考。该项目既凸显了 AI 作为医疗验证工具的潜力，也揭示了当 AI 诊断与人类临床意见冲突时，患者所面临的“无所适从”的困境。作者总结认为，虽然我们目前还无法完全信任 AI 进行医疗审查，但该技术正迅速向这一未来演进。

最近的一场 Hacker News 讨论强调了使用 Claude 等人工智能进行医学分析的风险。这场辩论由一名使用人工智能解读其核磁共振（MRI）检查结果的用户引发。放射科医生及其他参与者对这种做法提出了警告，并指出了几个关键局限性： * **人工智能的固有风险：** 专家指出，大语言模型存在“幻觉推理”问题，即在数据不足的情况下，它们仍可能编造临床发现或给出自信的诊断。 * **技术局限性：** 核磁共振是复杂的 3D 数据集，而当前的大语言模型通常只能分析 2D 图像，这可能导致错误的解读。 * **“专家差距”：** 评论者指出，人工智能的输出往往对外行具有说服力，但很容易被领域专家识破。这造成了一种危险的情况：患者可能会基于人工智能的缺陷逻辑而怀疑专业的医疗建议。尽管一些用户认为人工智能有助于理解医学术语，但专业人士的共识是，人工智能无法取代临床专业知识。该讨论强调，依赖大语言模型获取第二诊断意见是不可靠且具有误导性的，许多人敦促用户应寻求合格的人类医生提供的第二诊疗意见。

原文

This article is about my experience using Opus 4.8 to read the results of an MRI and give me a sort of second opinion on the diagnosis. Of course, I know the technology might not be there yet, which is why I'm sharing this article. Maybe it can help someone or at least provide a bit of information or entertainment.

Disclaimer: I'm of course not a doctor (this is actually the problem!) so please take everything I say with a grain of salt.

Some context (feel free to skip)

For a few weeks now, I've been experiencing some pain in my right shoulder. Even though it seemed to be getting better, I decided to get an opinion from an orthopedist. I won't go into the details, but he suggested I get an MRI, which the clinic conveniently had available. I agreed and mainly learned that I had a "Grade III (>50%-width) partial-thickness tear at the apical insertion" of my subscapularis tendon. This, of course, means little to me, but their suggested course of treatment was extensive; they even started a few minutes after I got the MRI. Coming out of the clinic, I had the feeling they had jumped the gun.

Thankfully, before I left, I asked them to send me a copy of the MRI results and a list of all the treatments they performed and suggested we repeat a total of 3 times.

I sent everything over to GPT 5.5 Pro, and right away it flagged two things:

They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification.
They injected me with Traumeel, which is registered in Germany as a homeopathic medicine "without a therapeutic indication".

That did not increase my confidence. So it made me curious to analyze the MRI.

Setting up Opus to do a first review of the MRI

The MRI package was a standard DICOM export containing a few hundred files without extensions, totaling around 266 MB.

For the analysis, I decided to use Opus 4.8 (xhigh) within Claude Code to give it the ability to run code and install packages. Before any work was done, I told it to install any packages that it might need for the analysis. Using Claude Code is especially important to enable it to perform significant amounts of work on this matter. It might seem obvious to coders, but the difference between Claude Code and Claude.ai's chat is enormous, even if those two run the same model.

It was then time to get started. Considering I know nothing about MRIs, I set things up to have Claude work hard on a detailed plan and then take action. The only instruction I gave was "right shoulder pain for 2–3 weeks," which I later realized was less than the human doctors received.

After around an hour, it came back with the report:

PDF

Right-Shoulder-MRI-Report.pdf

PDF 7.72 MB

The critical problem with that report was that where the doctor saw a Grade III (greater-than-50%) partial-thickness tear at the apical insertion, Opus 4.8 reported an intact tendon!

This was quite disconcerting. I expected the grade to be lower, but that finding was extreme.

Arbitrating the two analyses

To adjudicate, I decided to have Claude do a comparison of the two reports. But this time, I gave it a bit more context; on top of giving the human report, I also provided it with a discussion I had with ChatGPT 5.5 Pro, where I had it give me movements and positions to try as a way of figuring out what my diagnosis was.

From the planning document, here is the approach Opus took:

The approach was careful and methodical, with multiple subagents used as a way of getting new analyses that weren't biased by the existing context.

Again, after around an hour, I got a new report:

PDF

Right-Shoulder-MRI-Arbitration.pdf

PDF 4.52 MB

Its conclusion was:

Arbiter's verdict: Evidence favours Reader A (moderate-to-high confidence). Mild insertional tendinosis; NO discrete partial- or full-thickness tear identified, including at the apical insertion.

I can't help but find it fascinating that the verdicts are so far from each other. Looking further into the report, I can read that Opus wasn't afraid to say that there are some disputes between the two reports that it can't resolve, and yet this one it could; and very decisively.

Where does that leave me?

There's something incredibly peaceful about being in the hands of an expert you trust. You don't have to worry anymore and can let them guide you through the process.

AI can absolutely shatter that feeling in an uncomfortable way: After having gotten this AI-driven second opinion, the diagnosis and treatment plan look premature and more intervention-heavy than the facts seemed to justify... but I don't know if I can fully trust AI either. So I'm left in a state of limbo where I either try my luck with another doctor or wait and see if my shoulder gets better with the rehab I'm doing.

My hope is that in a couple of model generations, we'll trust AI to review MRIs the way we trust it to proofread our emails.

I am not naming the clinic or doctor because this isn't the point of the article. It's about sharing my technical curiosity about using AI to get second opinions. I may be wrong, or the AI might be wrong. I could also have misunderstood the doctors. So basically, none of this should be taken as medical advice :)