X11 的 Unicode 字体与工具
Unicode Fonts and Tools for X11

原始链接: https://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

本项目提供了一套经过扩展、符合 Unicode 标准(ISO 10646-1)的经典 X11 位图字体。这些更新后的“-misc-fixed-*”字体现已支持多种字符集,包括全面的欧洲语言子集、希腊语、西里尔语、国际音标(IPA)、数学符号、盲文等。项目还增加了多种新字体,包括斜体变体以及用于日语和韩语的专用双倍宽度字体。此外,标准的 Adobe 和 B&H 像素字体也经过修订,加入了现代编码支持并修复了错误。 这些字体专为支持 Unicode/UTF-8 的应用程序而设计。安装包内包含一个转换脚本 `ucs2any.pl`,可将这些字体映射到旧式编码,以兼容传统软件。虽然这些字体涵盖了大多数书写系统,但刻意排除了印度语系、阿拉伯语和叙利亚语,因为 X11 位图系统缺乏处理这些语言所需的复杂字符到字形映射的基础设施。 该项目发行范围广泛,包含在 XFree86 和 X.Org 的发布版本中。所有字体均属于公共领域,并持续维护以确保符合 Unicode 3.2 标准及提供后续的错误修正。完整的安装包和文档可通过提供的下载链接获取。

Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 X11 的 Unicode 字体与工具 (cam.ac.uk) 7 分,由 kristianp 发布于 1 小时前 | 隐藏 | 过往 | 收藏 | 1 条评论 帮助 jech 22 分钟前 [–] 那是很久以前的事了。传统上,Unix 下的字符是以区域设置(locale)相关的方式编码的:西欧使用 ISO 8859-1,东欧使用 ISO 8859-2,日本使用 EUC-JP 等。在 20 世纪 90 年代,以 Markus Kuhn 和 Bruno Haible 为首的人大力推动 XFree86(X.Org 的前身)转向与区域设置无关的 UTF-8。 该链接指向 Markus Kuhn 的网页,其中似乎描述了 1998 年左右可用的 UTF-8 软件。 回复 准则 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文
Unicode fonts and tools for X11

The classic X Window System bitmap fonts are now available in an ISO 10646-1/Unicode extension.

UTF-8 xterm screenshot using 6x13.bdf

We have extended all the “-misc-fixed-*” fonts:

  5x7     -Misc-Fixed-Medium-R-Normal--7-70-75-75-C-50-ISO10646-1
  5x8     -Misc-Fixed-Medium-R-Normal--8-80-75-75-C-50-ISO10646-1
  6x9     -Misc-Fixed-Medium-R-Normal--9-90-75-75-C-60-ISO10646-1
  6x10    -Misc-Fixed-Medium-R-Normal--10-100-75-75-C-60-ISO10646-1
  6x12    -Misc-Fixed-Medium-R-Semicondensed--12-110-75-75-C-60-ISO10646-1
  6x13    -Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
  6x13B   -Misc-Fixed-Bold-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
  7x13    -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-70-ISO10646-1
  7x13B   -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-70-ISO10646-1
  7x14    -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-70-ISO10646-1
  7x14B   -Misc-Fixed-Bold-R-Normal--14-130-75-75-C-70-ISO10646-1
  8x13    -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-80-ISO10646-1
  8x13B   -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-80-ISO10646-1
  9x15    -Misc-Fixed-Medium-R-Normal--15-140-75-75-C-90-ISO10646-1
  9x15B   -Misc-Fixed-Bold-R-Normal--15-140-75-75-C-90-ISO10646-1
  10x20   -Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISO10646-1

Coverage

These fonts now contain all characters found in the following character sets:

The 6x13, 8x13, 9x15, 9x18, and 10x20 fonts cover, in addition, a much larger repertoire that includes the comprehensive CEN MES-3A European Unicode 3.2 Subset, the International Phonetic Alphabet, Armenian, Georgian, Thai, Yiddish, all Latin, Greek, and Cyrillic characters, all mathematical symbols (including the entire TeX repertoire), APL, Braille, Runes, and much more. 9x15 and 10x20 also cover Ethiopian.

Newly added fonts

The following new “-misc-fixed-*” fonts were added:

  6x13O   -Misc-Fixed-Medium-O-SemiCondensed--13-120-75-75-C-60-ISO10646-1
  7x13O   -Misc-Fixed-Medium-O-Normal--13-120-75-75-C-70-ISO10646-1
  8x13O   -Misc-Fixed-Medium-O-Normal--13-120-75-75-C-80-ISO10646-1
  9x18    -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1
  9x18B   -Misc-Fixed-Bold-R-Normal--18-120-100-100-C-90-ISO10646-1
  12x13ja -Misc-Fixed-Medium-R-Normal-ja-13-120-75-75-C-120-ISO10646-1
  18x18ja -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1
  18x18ko -Misc-Fixed-Medium-R-Normal-ko-18-120-100-100-C-180-ISO10646-1

6x13O, 7x13O, and 8x13O are oblique/italic versions of 6x13, 7x13, and 8x13. 9x18 is an improved version of 9x15 that has more space above and below the base characters to increase readability and to allow overstriking combining characters to work properly. 18x18ja and 18x18ko provide Japanese and Korean doublewidth ideograms for 9x18. 12x13ja provides Japanese doublewidth ideograms for 6x13.

Adobe BDF fonts

I have also created revised ISO10646-1 versions of all the Adobe and B&H pixel fonts that come with X11R6.4. They contained about 30 additional PostScript characters (roughly the CP1252 repertoire) that were present in the old ISO8859-1 BDF files, but were not encoded and therefore not accessible to X clients. The revised ISO10646-1 versions contain not only these but also many more automatically generated accented Latin characters (e.g., all characters from ISO 8859 parts 1–4, 9–10, 13–15), and they also fix a few long-standing bugs with the old fonts (missing NBSP, exchanged multiplication/division sign, etc.).

Status

The fonts are now complete and currently implement version 3.2 of the Unicode standard (ISO 10646-1/Amd.1:2002). I will maintain them to fix bugs and to satisfy any newly reported user requirements. Note that the new fonts fix a problem with the Latin-1 quotation mark and accents.

Download

The fonts are freely available with installation instructions and example UTF-8 text files.

The “-misc-fixed-*” font package:
https://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz
CJK ideographic wide character supplement (unpack into the same subdirectory as the above):
https://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz
The Adobe and B&H font package:
https://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-75dpi100dpi.tar.gz

There is also a change log file for the “-misc-fixed-*” fonts.

Other character sets

The font packages include the ucs2any.pl Perl script, which converts ISO 10646-1 fonts into any other encoding for which there is a Unicode mapping table available. This way, you can quickly generate ISO 8859-* versions from the above fonts automatically, for the benefit of older software that cannot yet handle ISO 10646-1 fonts directly.

Distribution

I periodically contribute a recent snapshot of all of the above fonts to XFree86 and they have been shipping as part of the XFree86 releases since XFree86 4.1. I have also made them available to X.Org for inclusion in one of the next official X11 distributions as a replacement for the current ISO 8859-1 BDF fonts (hopefully they will be in X11R6.7).

Who created the original -misc-fixed-* ASCII fonts or the later ISO 8859-1 extension is not documented. They most likely came from either MIT Project Athena or its industrial partner DEC in the 1980s. Most of these fonts contained the following property in the header of the BDF file:

COPYRIGHT "Public domain font.  Share and enjoy."
The contributors of the *-ISO10646-1 extensions agreed to keep it that way, i.e., we haven't changed any of these copyright strings. It is very unlikely that such low-res pixel fonts could even be copyrighted, as they are clearly pictures and not computer programs, and as there are only a limited number of ways to draw such glyphs in a recognisable way. (Some countries explicitly do not offer any copyright protection for typefaces; see, e.g., 37 C.F.R. § 202.1(e) in the United States. In some others, any such protection would expire after 25 years or less.)

Related information and links

  • Read the UTF-8 and Unicode FAQ for Unix/Linux for detailed general information on how to use Unicode and its ASCII-compatible UTF-8 encoding under Unix, Linux, X11, etc.
  • To use these ISO10646-1 fonts, you will need applications that support ISO10646-1 fonts (hardly any software released before ~2001 does). These are not simply 8-bit replacement fonts but usually need to be used together with UTF-8 support in an application. For instance, if you want to use these fonts with xterm, you need to use an xterm version that can handle ISO10646-1 fonts (e.g., the one in XFree86 4.x).
  • The “-misc-fixed-*” fonts were created and extended using Mark Leisher’s xmbdfed font editor, which later evolved into the gbdfed font editor (using GTK+ instead of Motif). You can use the latter to view and modify these fonts.
  • Unicode X11 font names end with -ISO10646-1. This is the value in the official X registry (git) for the X Logical Font Descriptor (XLFD) fields CHARSET_REGISTRY and CHARSET_ENCODING for all Unicode and ISO 10646-1 16-bit fonts. There is no registered XLFD scheme yet for ISO 10646 characters outside the BMP, though some proposals have been discussed.
  • Unicode and ISO 10646 merged CJK ideograph repertoires from several groups of national source standards. In order to indicate that an ISO10646-1 font with ideographic characters was designed following the glyph style from one particular group of national source standards, the ADD_STYLE_NAME XLFD field can be used to indicate the corresponding language or region. Examples of such ADD_STYLE_NAME values are:
    ADD_STYLE_NAMEIRG SourceCountriesStandards
    zhGChina, Hong Kong, SingaporeGB2312, GB12345, GB7589, GB7590, GB8565, GB16500
    zh_TWTTaiwanCNS 11643
    jaJJapanJIS X 0208, JIS X 0212
    koKKoreaKS C 5601, KS C 5657, PKS C 5700
    viVVietnamTCVN 5773, TCVN 6056

    ISO 639 language and ISO 3166 country/region codes should be preferred in ADD_STYLE_NAME. This way, if an application knows a style preference from, for example, the RFC 1766 code “xx-yy” in a language tag, or from the locale name “xx_YY.ZZZ”, it can search for a suitable font by looking for ADD_STYLE_NAME values in the following order of preference: “xx_YY”, “xx”, “”, “*”.

  • The specification of the BDF X11 pixel font format is available as Technical Note 5005 from Adobe or in a simplified form as part of the X11 documentation.
  • The fonts use the Adobe standard glyph names for Unicode fonts.
  • There is also a note on why the apostrophe and grave accent look different in the new fonts.
  • A bug in old GTK+ libraries is triggered by the presence of ISO10646-1 Helvetica fonts.

Other information relevant to Unicode font projects

Why are there no Indic or Syriac glyphs in the ucs-fonts package?

In European and East Asian scripts, each Unicode character can be represented by a single graphical shape (“glyph”). The X11 font system is entirely built around the idea that there is a one-to-one relationship between characters and glyphs, which works fine for Latin, Greek, Cyrillic, Hebrew, Han, Hiragana, Katakana, Hangul, etc. However, things are far more complicated for handwritten cursive scripts such as Arabic, Syriac, and the various Indic scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, etc.). For these scripts, the sequence of values (“characters”) encoded in a Unicode string (which usually corresponds to the sequence of keystrokes during entry and the sequence of phonemes when speaking) first has to be converted into a sequence of graphical symbols (“glyphs”) as they are found in a font, before a string can be displayed. In a given Latin font style, the same graphical glyph of a font will always be used to represent a character on a screen. In an Arabic or Indic font, the shape of the glyph depends not only on the character that it represents, but also on its neighbouring characters. Sometimes, different glyphs have to be used depending on the character appearing at the beginning, middle, or end of a word, and often certain entire sequences of characters have to be represented by a special ligature glyph. A very simple form of that is used in Latin fine typography in the form of the “fi” and “fl” ligatures, but in Indic scripts, the situation is far more extreme, and the number of glyphs is often several times the number of characters. For details and examples, read Chapter 9 and Chapter 8 as well as the relevant code charts of The Unicode Standard.

The Unicode standard does contain encoding ranges for a simple scheme of Arabic glyphs, the “Arabic Presentation Forms”. This was possible because for Arabic there is a reasonably good consensus among font designers on how many glyphs are actually necessary for proper rendering of Arabic text, even though some argue that for really high-quality typesetting the Unicode collection of Arabic presentation forms is not sufficient. For Indic scripts, on the other hand, there seems to be no consensus among font designers as to which glyphs are actually necessary, since this can vary significantly across different font styles. Therefore, an Indic font is always a proprietary non-standardized collection of glyphs together with a mapping table that defines how sequences of standard Unicode characters have to be transformed into sequences of non-standard Indic glyphs from this particular font before the text can be displayed.

The OpenType font format developed by Microsoft and Adobe is an outline font format that does include such character/glyph mapping tables. The BDF format used by X11 pixel fonts does not have any standardized way of including a character/glyph mapping table, and neither do current BDF editors such as xmbdfed, nor X servers. The Pango rendering library developed for the GNOME project can make use of BDF glyph fonts, but it requires the corresponding character/glyph mapping table in a separate client-side file. The X11 standards currently provide no support for transmitting such mapping tables over the X11 protocol. Roman Czyborra’s GNU Unifont does contain a naive representation of the Indic glyphs shown in the Unicode Standard code charts, but that is of no use in practice for displaying Indic strings properly.

Summary: X11 was never designed for Arabic, Syriac, or Indic; special libraries such as Pango have to be used for these scripts. If you want to help get Indic supported under X11, you have to extend the X11 standards to fix this problem and provide a font mechanism that understands that some scripts need to map characters into glyphs. The solution is unfortunately not as easy as just drawing a few glyphs with a font editor; otherwise, we would already have added the Indic scripts long ago to the ucs-fonts package.

Markus Kuhn

联系我们 contact @ memedata.com