请提供需要翻译的内容。
Ucs-Detect

原始链接: https://ucs-detect.readthedocs.io/intro.html

## ucs-detect: Unicode 终端兼容性测试 `ucs-detect` 是一个 Python 工具,旨在自动评估终端模拟器的 Unicode 支持,特别是针对宽字符、表情符号序列(ZWJ & VS-16)以及零宽度/组合字符在各种语言中的表现。它通过查询终端的光标位置,并将结果与 `wcwidth` 库规范进行比较来实现。 安装很简单,使用 `pip install -U ucs-detect` 即可。用户可以使用类似 `ucs-detect --save-yaml=data/my-terminal.yaml` 的命令运行详细测试并将结果保存为 YAML 文件。超过 20 个终端的广泛测试结果已公开可用 ([https://ucs-detect.readthedocs.io/results.html](https://ucs-detect.readthedocs.io/results.html) & [https://www.jeffquast.com/post/state-of-terminal-emulation-2025/](https://www.jeffquast.com/post/state-of-terminal-emulation-2025/))。 该工具解决了终端和库之间 Unicode 实现不一致的问题,并利用《世界人权宣言》(UDHR)数据集进行多语言测试。用户可以通过 pull request 贡献更新,并可以使用 `--stop-at-error` 标志进行交互式调试。 `ucs-detect` 有助于识别 Unicode 渲染中的差异,确保更好的兼容性和在不同语言中准确显示文本。

``` Hacker News新帖 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Ucs-Detect (ucs-detect.readthedocs.io) 3点 由 djoldman 1小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索: ```
相关文章

原文

Without any arguments,

ucs-detect automatically tests the Unicode version and support level of a terminal emulator for Wide character, Emoji Zero Width Joiner (ZWJ) sequences, Emoji Variation Selector-16 (VS-16) sequences, and Zero-Width or combining characters by supported Language. A brief report is then printed to stdout.

video demonstration of running ucs-detect

Installation & Usage

To install or upgrade:

$ pip install -U ucs-detect

To use:

To run a detailed test and store a yaml report to disk:

$ ucs-detect --save-yaml=data/my-terminal.yaml --limit-codepoints=5000 --limit-words=5000 --limit-errors=500

Test Results

More than twenty modern terminals for Windows, Linux, and Mac were tested, their results have been collected into this repository and a detailed summary is published at URL https://ucs-detect.readthedocs.io/results.html

An article describing the development of ucs-detect and summarizing the results for the 1.0.4 release of ucs-detect, November 2023 is published at https://www.jeffquast.com/post/ucs-detect-test-results/

A follow-up November 2025 article discussing the results of another round of testing, including DEC Private Modes support, for release of ucs-detect 1.0.8 is published at https://www.jeffquast.com/post/state-of-terminal-emulation-2025/

Individual yaml data file reports for these terminals may also be inspected at the repository folder data, https://github.com/jquast/ucs-detect/tree/master/data

Please note that results will be shared with Terminal Emulator projects and this information may become out of date as they improve their support for Unicode. Please do not expect the maintainers of ucs-detect to update these data files. If you wish for this report to be corrected for any given Terminal, please feel free to submit a pull request with an update to the yaml data files.

Problem

Many East Asian languages contain Wide (W) or Fullwidth (F) characters, meaning that each character occupies 2 cells instead of 1. Further, many languages contain special combining characters that are “zero width”, meaning they do not occupy any cells, only modifying the previous one as a “combining” character. Finally, there are “Zero Width Joiner” and “Variation Selector-16” characters that are used in sequence for Emoji characters.

A terminal application that displays these characters may have trouble determining how it will be displayed to the end-user. This problem happens often, because the Unicode Consortium releases new versions of the Unicode Standard periodically, but the source code of libraries and applications are not updated at the same time, or at all!

Finally, a terminal emulator may have varying levels of support. For example, at time of this writing, Microsoft’s Terminal.exe supports up to Unicode 15.0 for Wide characters, is missing support for 27 characters of Unicode 13.0, has no support for Emoji ZWJ, fully supports all VS-16 sequences, but fails to correctly categorize many Zero-Width for 88 or more of the world’s languages.

Solution

The most important factor is to determine whether the Terminal Emulator complies with the Specification published by the python wcwidth library.

This program, ucs-detect, is able to automatically detect the version and feature level support of unicode that the connecting Terminal supports for WIDE, ZERO, ZWJ, and VS-16 characters.

How it works

The solution in this program is the use of the Query Cursor Position terminal sequence, which asks, “where is the cursor?”. This is a hidden sequence that a Terminal Emulator automatically responds to.

By use of this sequence, and the data tables of the wcwidth library, we can test for compliance of the python wcwidth library Specification.

The use of Query Cursor Position is inspired by the resize(1) program distributed with X11, which determines the terminal size over transports that are not capable of communicating by signal or forwarding by environment value, such as over a serial line. resize(1) simply moves to (999, 999) then asks, “where is my cursor?” and the response is understood to be the terminal size.

Updating Results

Use the re-run.py script to update the results for a new version of a terminal that is already tracked, for example:

$ python re-run.py data/contour.yaml

This will re-execute ucs-detect with the test with the same parameters as the previous test. The new results will overwrite the existing.

Or, to submit results for a new terminal not previously tracked:

$ ucs-detect --save-yaml=data/jeffs-own-terminal.yaml --limit-codepoints=5000 --limit-words=5000 --limit-errors=1000

Conditionally you may reduce the test size for slow terminals like those using libvte which require more than 5 hours to complete.

Create a pull draft pull request to this repository with your change and a github commit status should be reported by readthedocs.org, and, clicking “Details” should show the html preview.

Problem Analysis

Use the CLI argument, --stop-at-error= to interactively investigate discrepancies as they are detected. For example:

$ ucs-detect --stop-at-error 'Hindi'

This presents output when an error occurs during Hindi language testing:

ucs-detect: testing language support: Hindi
मानव

Failure in language 'Hindi':
+----------------------------+
|            मानव             |
+----------------------------+

measured by terminal: 4
measured by wcwidth:  3

printf '\xe0\xa4\xae\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xb5\n'
from blessed import Terminal
term = Terminal()
y1, x1 = term.get_location(); print('मानव', end='', flush=True); y2, x2 = term.get_location()
assert x2 - x1 == 3

The Universal Declaration of Human Rights (UDHR) dataset contains translations into 500+ languages, providing a practical multilingual test corpus for evaluating terminal emulator support of zero-width characters (category Mn - Nonspacing Mark), combining characters (category Mc - Spacing Mark), and language-specific scripts. The 532 UDHR text files in ucs_detect/udhr/ are sourced from https://github.com/eric-muller/udhr/

Although there is no fully complete test suite of all zero-width and combining characters across all possible Unicode codepoints, the UDHR provides practical coverage of the vast majority of the world’s languages. By exhaustive interactive testing of this dataset (testing hundreds of languages with real-world text), the quality of language testing results serves as a suitable indicator of a terminal’s quality of support for combining marks across diverse scripts, complex grapheme clusters, and script-specific rendering requirements.

联系我们 contact @ memedata.com