脑-IT：通过脑交互Transformer从fMRI重建图像

脑-IT：通过脑交互Transformer从fMRI重建图像
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

原始链接: https://AmitZalcher.github.io/Brain-IT/

## Brain-IT：从大脑活动重建图像这项研究介绍了Brain-IT，一种从个体fMRI大脑记录中直接重建所见图像的新方法。Brain-IT解决了现有方法在图像保真度方面的局限性，利用大脑交互Transformer (BIT) 有效分析功能相似的大脑区域之间的交互——这些区域在所有受试者中共享。 BIT预测高级语义特征（图像内容）和低级结构特征（图像布局），以引导扩散模型，从而实现更准确的图像重建。一个关键创新是共享的“体素到簇”映射，即使在数据有限的情况下也能实现高效训练。结果表明，Brain-IT在视觉质量和客观指标方面均优于现有技术。值得注意的是，它仅使用*一小时*的新受试者数据，就能达到与训练40小时fMRI数据的方法相当的性能，甚至仅用15分钟，也能实现有意义的重建。这标志着更易获取和更高效的大脑解码技术迈出了重要一步。

黑客新闻新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Brain-IT：通过大脑交互Transformer从fMRI重建图像 (amitzalcher.github.io) 10点由 SerCe 1小时前 | 隐藏 | 过去的 | 收藏 | 讨论考虑申请YC冬季2026批次！申请截止至11月10日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Roman Beliy^* · Amit Zalcher^* · Jonathan Kogman · Navve Wasserman · Michal Irani

* Denotes Equal Contribution

—

Reconstruction teaser: seen images vs. Brain-IT reconstructions (40h & few-shot)

Abstract

Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present Brain-IT, a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i) high-level semantic features, which steer the diffusion model toward the correct semantic content of the image; and (ii) low-level structural features, which help to initialize the diffusion process with the correct coarse layout of the image. BIT’s design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current state-of-the-art approaches both visually and by standard objective metrics. Moreover, with only 1 hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40 hour recordings.

Brain-IT Overview

Overview of the Brain-IT pipeline including V2C mapping, BIT, semantic and low-level branches

The Brain Interaction Transformer (BIT) transforms fMRI signals into Semantic and VGG features using a shared Voxel-to-Cluster (V2C) mapping. Two branches are applied: (i) the Low-Level branch reconstructs a coarse image from VGG features, used to initialize the (ii) Semantic branch, which uses semantic features to guide the diffusion model. Each voxel from every subject is mapped to a functional cluster shared across subjects, enabling integration within and across brains. Our Brain-IT pipeline thus reconstructs images directly from fMRI activations by first predicting meaningful image features with BIT, then refining them through a diffusion model guided by semantic conditioning and a Deep Image Prior (DIP) that ensures structural fidelity.

Brain-Interaction Transformer (BIT)

BIT architecture: tokenizer and cross-transformer modules

The BIT model predicts image features from voxel activations (fMRI). The Brain Tokenizer maps the fMRI activations into Brain Tokens, which are representations of the aggregated information from all the voxels of a single cluster (one token per cluster). The Cross-Transformer Module integrates information from the Brain Tokens to refine their representation, and employs query tokens to retrieve information from the Brain Tokens and transform it into image features, with each query token predicting a single output image feature.

Results

Qualitative Comparisons (40h)

Qualitative comparisons on Subject 1 with 40-hour training

Qualitative Comparisons - limited amount of subject-specific data (1 hour)

Reconstructions with 1 hour of subject-specific data compared to prior work

Quantitative Metrics

Table of low- and high-level metrics across methods (40h and 1h)

Brain-IT demonstrates strong semantic fidelity and structural accuracy across multiple evaluation metrics. Importantly, with just 1 hour of data, Brain-IT is comparable to prior methods trained on the full 40 hours.

BibTeX

@misc{beliy2025brainitimagereconstructionfmri, title={Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer}, author={Roman Beliy and Amit Zalcher and Jonathan Kogman and Navve Wasserman and Michal Irani}, year={2025}, eprint={2510.25976}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.25976} }