PostHog 将使用您的数据训练 AI 模型（默认开启）。

PostHog 将使用您的数据训练 AI 模型（默认开启）。
PostHog will train AI models with your data (opted-in by default)

原始链接: https://posthog.com/blog/training-ai-models

PostHog 正转向“自动驾驶”产品，旨在超越简单的 AI 功能，打造能自动解决用户问题的主动式工具。该公司正在引入 PostHog Code（目前处于测试阶段），并计划利用用户数据训练 AI 模型，以改进会话回放分析、实现合成用户测试自动化，并提供可落地的转化洞察。为实现这一目标，PostHog 计划基于内部客户数据进行模型训练。他们强调透明度，并制定了明确的数据处理政策： * **隐私与安全：** 所有训练数据都将进行匿名化处理，保留在内部，绝不出售或与第三方供应商共享。 * **选择退出模式：** 美国云端用户默认加入，而欧盟用户及有特定法律协议的用户则默认退出。用户可随时通过组织设置更改其状态。 * **时间表：** 训练计划于 6 月 29 日开始，为用户留出充足的时间来管理其偏好设置。 PostHog 强调，这些举措旨在增强其产品套件。选择退出的用户将无法使用由此产生的进阶 AI 功能。该公司始终致力于公开沟通，并正在积极招聘 AI 研究人员以支持这一转型。

PostHog 宣布将开始使用客户数据来训练其内部 AI 模型。尽管该公司强调了透明度，并指出会对数据进行匿名化处理且不会与第三方共享，但这一决定在 Hacker News 上引发了强烈抵制。争议的关键点包括： * **默认设置：** 美国云端用户被默认“加入”，而由于更严格的数据保护法规，欧盟用户则被默认“排除”。批评者认为，所谓的“默认加入”是一个自相矛盾且具有欺骗性的说法，利用了用户的信任。 * **伦理与隐私：** 许多用户表示沮丧，因为他们最初选择该平台是因为它是 Google Analytics 的隐私友好型替代品，而现在该平台却将其数据挪作自有 AI 开发之用。 * **客户反应：** 该公告引发了一波负面情绪，一些用户表示计划取消订阅或转向自托管替代方案，理由是失去了信任感并感受到了所谓的“平台劣化”（enshittification）。总体而言，社区共识持高度批评态度，许多人认为企业应优先考虑用户的明确同意，而不是通过牺牲数据隐私来换取业务增长的“黑暗模式”。

原文

I really think we're on the verge of some of our best work through the next six months.

Over the past year, we've started building more AI-powered features into PostHog, like our AI installation wizard, PostHog AI, and our MCP. They're all wildly popular, but they're only the start.

PostHog's next chapter is about building more proactive, self-driving products. Products that surface answers and solutions for you, act on them, and improve over time.

This is the vision for PostHog Code, which is now in beta. To enable this and more products like it, we want to try something new.

We want to train models on data in PostHog.

We have two goals here:

Make our existing products smarter, more proactive, and useful to you
Build entirely new products, like PostHog Code, that help teams build better products, faster

The first area we're interested in is session replay analysis. PostHog AI can already detect issues in replays, but it's expensive and doesn't scale well. We want replays to be as powerful at scale as they are for diagnosing the problems of individual users, and we think a model trained on the underlying data that powers replays will help us achieve this.

Another idea I'm especially excited about is synthetic user testing – i.e. using our knowledge of user behavior to identify when users might get confused, or what flows might break, before you ship to production. As coding models improve, many people are seeing test and review workload increase hugely. We want to automate this, so you can focus on your product.

And, if we can get better at predicting user behavior, we should be able to suggest changes that will improve conversion, and reduce user frustration, for features you've already shipped as well. If we can automate this work for you, you'll spend less time on manual analysis and burn fewer tokens in the process.

Our ideas here are experimental. It will take iteration to figure out how to train models effectively, and what data is actually useful. But, so far, every time we've added AI in a way that makes the product simpler or more powerful, it's worked well, so we think it's worth trying.

We've spent a lot of time thinking about this from a user perspective, especially the tradeoffs.

The upside is the kinds of improvements described above.

Most tools are focused on providing you with the best code; we want to focus our energy into making your product the best it can be. This is why we describe PostHog Code as a product editor.

The downside is that this involves using data in PostHog to train models.

Most companies would bury this change in a deceptively boring T&Cs update, but we value transparency, so here's what you need to know in an internet-friendly numbered list:

Users on our EU cloud instance are opted out by default
So too users with agreements that prevent training (e.g. BAA, MSA, or similar)
All other users on our US cloud instance are opted in by default
We will anonymize all data before it's used for training
We will only use data that already exists in your PostHog instance
We will do all the model training ourselves, which means...
We won't sell or send your data to third-party model providers
You can opt out at any time via your org settings in PostHog (admin access required)
Training won't start until June 29, so there's plenty of time to decide

In terms of comms, we are:

Emailing all our customers and making it super obvious what the email is about
Notifying all our users through in-app notifications (in case you don't read emails)
Communicating our plans very publicly (like in this post)

I want to stress that our goal here is to improve PostHog as a product for our customers, not to expose or sell models trained on your data, or monetize your data.

Put simply, because otherwise we will not have enough data to train a model that's actually useful.

If you choose to opt out, the new features that we're building with these models won't be available to you, as they'll depend on this data.

If you're opted out by default (e.g. because you're on our EU cloud instance), you can choose to opt in manually provided any legal agreements you have with us don't exclude this option.

We're choosing to be upfront about this rather than quietly rolling something out, because we think that's the right way to do it.

If you want to talk about this, I'm james at you can guess it.

We're also hiring AI researchers, so get in touch if you want to work on this with us.

PostHog 将使用您的数据训练 AI 模型（默认开启）。 PostHog will train AI models with your data (opted-in by default)

PostHog 将使用您的数据训练 AI 模型（默认开启）。
PostHog will train AI models with your data (opted-in by default)