缺失的目录:为何寻找译本书籍仍然如此困难
The missing catalogue: why finding books in translation is still so hard

原始链接: https://blogs.lse.ac.uk/impactofsocialsciences/2026/04/13/the-missing-catalogue-why-finding-books-in-translation-is-still-so-hard/

## 翻译文学的隐形世界 尽管翻译作品比比皆是——仅《小王子》就有超过100种语言版本——但要发现它们却出乎意料地困难。这并非内容匮乏,而是一个关键基础设施问题:翻译元数据分散在众多不相关的数据库中,如国家图书馆、商业聚合商(ISBNdb)和协作项目(Wikidata)。没有单一来源提供全面的概述,即使联合国教科文组织的历史《翻译索引》也已严重过时。 这种分散导致文化抹除,特别是对于较小语种,它们的翻译如果没有适当的编目,将仍然隐形。Zenòdot等项目通过交叉引用23个来源证明了这一点;它表明,仅仅通过链接现有数据,就能提高加泰罗尼亚语/瓦伦西亚语的可见性。相反,由于机构数据不互联,孟加拉语等语言仍然代表性不足。 Zenòdot的调查结果显示,近90%的ISBN验证版本仅出现在一个来源中,突出了每个来源拥有的独特信息。解决这个问题需要图书馆之间的合作、标准化的元数据以及认识到翻译是重要的书目数据。最终,构建“全球翻译地图”不仅仅是为了庆祝书目多样性——更是为了确保所有文学作品都能被发现和统计。

一篇近期文章指出,由于书目数据的缺失,发现翻译书籍存在困难。作者在构建数据集时发现,一些语言(如加泰罗尼亚语/瓦伦西亚语)在国家图书馆目录交叉引用后,可见性显著提升,而另一些语言(孟加拉语、泰语、乌尔都语)尽管出版量很大,但记录仍然不足。 一个关键问题是缺乏全面的ISBN元数据——这对于小型出版商来说是一个代价高昂且通常无偿的过程。这导致97%的版次仅出现在14个主要来源中的*一个*。问题不是缺乏翻译作品,而是记录这些作品的基础设施不完善,阻碍了读者和研究人员的发现。作者可进一步讨论方法和数据来源。
相关文章

原文

Finding a work in translation is harder than you think. Discussing the creation of Zenòdot, a cross-referencing project for books in translation, Ausiàs Tsel outlines the challenges of creating a record of translated works across different catalogues and what is lost when these records do not exist.


Le Petit Prince is one of the most translated books ever published. It exists in well over a hundred languages. But if you tried to find all those translations in a single database, you would fail. The world’s largest commercial ISBN aggregator documents editions in roughly seventy languages. The rest are scattered across national library catalogues in Tokyo, Jerusalem and Oslo, across collaborative databases like Wikidata, and across historical archives that stopped being updated fifteen years ago.

No single source covers more than two-thirds of the picture. This is not a peculiarity of one famous children’s book. It is the normal state of affairs for translated literature worldwide.

Translation discovery is not a content problem. The translations exist. It is an infrastructure problem: the metadata that should connect them is fragmented, incomplete, and unevenly distributed across institutions that were never designed to talk to each other.

A fragmented landscape

There is no global catalogue of translations. The closest attempt was UNESCO’s Index Translationum, launched in 1932 and computerised in 1979. It accumulated over two million entries across some 800 languages. But contributions from national libraries slowed, and the database has not been meaningfully updated since the late 2000s. The longest-running international bibliography of translations is, in practice, a historical archive.

What remains is a patchwork. National libraries document what is published within their borders, not what is translated from their literatures into other languages. Commercial aggregators like ISBNdb hold vast numbers of records but with language metadata that is often missing, incorrect, or ambiguous. Wikidata contains translation data contributed by volunteers, but with significant gaps and biases toward well-resourced languages. Nine national library catalogues, from Norway to Taiwan, together contribute less than ten per cent of the editions tracked by Zenòdot, an independent cross-referencing project, while commercial aggregators contribute the largest share but still fall short of majority coverage (full disclosure: I am its creator). Each source holds pieces that no other source has. Nearly ninety per cent of the ISBN-verified editions in the system appear in only one of its twenty-three sources.

Every database holds something unique. None holds everything.

Visibility and bibliodiversity

If a translation is not in a database, it is functionally invisible. For smaller languages, this is not a technical inconvenience. It is a form of cultural erasure.

Consider what happens when catalogues are connected. Catalan/Valencian, a language spoken by over ten million people, was barely visible in global translation data until its editions were documented across multiple sources. It now ranks eighth by edition count in a twenty three-source cross-referencing system—ahead of Chinese and Russian. Not because Catalan/Valencian has more translations than those languages worldwide, but because the sources that document its editions have been linked. The translations had always existed. The metadata had not.

Bengali, Thai and Urdu tell the opposite story: languages with substantial publishing industries that remain near the bottom of global edition counts—not because translations do not exist, but because the institutions that document them have not yet been connected.

The pattern is consistent: what we can see depends on which libraries have been asked, which databases have been queried, which metadata standards have been adopted. Absence from a catalogue is not evidence of absence from the world. It is a signal of unconnected infrastructure.

Cross-referencing as infrastructure

Zenòdot is an independent project that attempts to address this gap by cross-referencing twenty three sources at present—sixteen national library catalogues, two commercial aggregators, Wikidata, UNESCO’s historical Index, and community contributions, with the system designed to integrate additional catalogues as institutional partnerships develop. It currently tracks over three million editions across hundreds of languages.

The technical challenges are significant. Each source uses different identifiers. Author names appear in different scripts—the project maintains over 300,000 name aliases to match records across writing systems. Language metadata is inconsistent: what one catalogue labels “Portuguese” another splits into Brazilian and European Portuguese; what one calls “Chinese” another distinguishes as Mandarin, Cantonese, or Classical Chinese. ISBN coverage varies wildly: Japan’s National Diet Library contributes editions in one language but with granular depth that no global aggregator matches.

These are not problems that a single tool can solve. They are symptoms of an ecosystem where bibliographic institutions have operated in parallel rather than in dialogue. Cross-referencing is not an answer. It is a diagnostic: it reveals how much is lost when each catalogue works alone.

Towards a global translation map

The problem will not be resolved by any single platform. It requires collaborative effort between national libraries, metadata standards bodies, and open knowledge communities. It requires that translation—as a category of bibliographic data—be treated with the same seriousness as authorship, publication date, or subject classification.

Making translations visible is an act of cultural infrastructure. It determines which literatures count as existing, which readers can find what they need, and which languages register in the global record. In a world that regularly celebrates bibliodiversity, the least we can do is build the tools to measure it.

When a single cross-referencing project (twenty three sources, sixteen national libraries) finds that nearly ninety per cent of ISBN-verified editions appear in only one source, the scale of disconnection becomes difficult to ignore.

The question is not whether translations exist. They do. The question is whether anyone can find them.

📨Enjoying this blogpost? Sign up to our mailing list and receive all the latest LSE Impact Blog news direct to your inbox 📨


The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.

Image credit: ToninT on Shutterstock.


联系我们 contact @ memedata.com