GPT‑NL：荷兰的主权语言模型

GPT‑NL：荷兰的主权语言模型
GPT‑NL: a sovereign language model for the Netherlands

原始链接: https://www.tno.nl/en/digital/artificial-intelligence/gpt-nl/

GPT-NL 是一项由公共资金支持的计划，旨在构建一个专为荷兰语言和环境量身定制的自主、负责任的语言模型。该项目获得了荷兰政府 1350 万欧元的资助，旨在摆脱对非欧洲提供商的依赖，以确保对技术和数据的自主掌控。该模型基于四大核心支柱： * **主权：** 开发本地基础设施，以符合欧洲法律和社会目标。 * **透明度：** 采用开源方法，记录所有训练方案的选择，并保持对模型开发的清晰监管。 * **可信度：** 从零开始进行训练以确保数据完整性，优先考虑版权合规、严格的匿名化处理，并排除有害或机密内容。 * **互惠性：** 建立公平的价值共享模式，让数据提供方参与治理并获得贡献补偿。除上述原则外，GPT-NL 还通过优化计算能力来减少能源和水资源消耗，从而强调可持续性。通过平衡创新与公共问责，GPT-NL 旨在提供一个安全、可靠且公平的 AI 基础，以加强荷兰的数字自主权。

抱歉。

GPT‑NL values

We are building a responsible language model for the Dutch language and context: trustworthy, transparent, reciprocal and sovereign.

Sovereign: control over technology that matters

GPT‑NL is developed within the Netherlands and Europe. This gives us full control over the model, the data and the choices we make. We avoid dependency on non‑European providers and invest in a sustainable AI ecosystem aligned with our laws, values and societal goals.

Open and transparent: insight from source to model

GPT‑NL is built on transparency. We clearly document the choices we make during data collection and training, and how we address risks such as bias and ethical concerns. We publish the source code as open source and share detailed insights into the dataset. Model weights are made available under a controlled licence. This allows us to know who uses the model and to inform users about updates or changes, for example following a data opt‑out. In this way, we operate transparently without compromising security or regulatory compliance.

Trustworthy: protecting users and citizens

We train GPT‑NL entirely from scratch. This prevents unclear data provenance, copyright risks or potential personal data from being inherited from existing models.

To ensure a reliable foundation, our data collection meets strict criteria:

Safeguarding intellectual property
Removing and anonymising personal data before model training
Excluding confidential information
Excluding harmful content
Avoiding duplication within the dataset

Reciprocal: fair agreements on data and value

GPT‑NL deliberately works with a clean and lawful data supply chain. We collaborate closely with data providers and actively involve them in the development of the model.

Through the Content Board, these data providers and rights holders have a voice in the future of GPT‑NL. Part of the revenues flows back to the creators. This creates a fairer innovation model in which value is shared rather than extracted.

Using resources efficiently

AI development requires significant computing power and energy. That is why we actively focus on energy efficiency and responsible use of resources. Based on scientific research, we optimise both the size of the model and the training process, with explicit attention to energy and water consumption.

Publicly funded, publicly accountable

GPT‑NL is funded by the Netherlands Enterprise Agency (RVO) on behalf of the Ministry of Economic Affairs and Climate Policy. A total of €13.5 million has been allocated to the project. This public investment underlines the importance of an independent, trustworthy and future‑proof Dutch language model.

GPT‑NL shows that powerful AI and public values can go hand in hand. Together, we are building technology that makes the Netherlands stronger, more autonomous and fairer.