元代码：Python机器可读注释的新标准

元代码：Python机器可读注释的新标准
Metacode: The new standard for machine-readable comments for Python

原始链接: https://github.com/pomponchik/metacode

## Metacode：Python动作注释的标准许多Python工具（linter、覆盖率工具、格式化工具等）使用特殊注释来发挥作用，但目前缺乏统一的标准。这个库，**Metacode**，旨在通过提供一种标准格式和解析器来解决这个问题，用于这些“动作注释”。 Metacode定义了带有`key`（工具名称）、`action`和可选`arguments`的注释。其语法是Python的一个子集，即使不直接使用该库，也易于实现。注释可以使用`#`链接，以便在同一行包含多个工具特定的指令。解析器返回包含key、action和arguments（字符串、数字、布尔值或启用时为AST对象）的`ParsedComment`对象。它提供大小写不敏感的key匹配和按key过滤的灵活性。一个互补的`insert()`函数允许创建格式化的注释。虽然该库提供了一个Python解析器，但核心语言被设计成可以通过定义的EBNF语法在其他语言中实现，从而促进更广泛的采用和Python生态系统的一致性。

## Metacode：Python 注释的标准一个新的库 Metacode 旨在标准化 Python 生态系统中的机器可读注释。目前，诸如代码检查工具和测试覆盖率系统之类的工具依赖于特殊注释，但缺乏一致的格式。这导致不同的工具采用多种且常常效率低下的解析方法——从简单的字符串处理到完整的 Python 解析器。 Metacode 提出了一种简单直观的标准来定义这些“动作注释”，并为其他开发者提供了一个现成的解析器以供集成。这旨在简化代码分析并提高 Python 开发工具之间的兼容性。一位用户询问是否可以提交 Python 增强提案 (PEP) 以正式认可该标准。该项目在 GitHub 上可用：[https://github.com/pomponchik](https://github.com/pomponchik)。

原文

ⓘ

Many source code analysis tools use comments in a special format to mark it up. This is an important part of the Python ecosystem, but there is still no single standard around it. This library offers such a standard.

In the Python ecosystem, there are many tools dealing with source code: linters, test coverage collection systems, and many others. Many of them use special comments, and as a rule, the style of these comments is very similar. Here are some examples:

Ruff, Vulture —> # noqa, # noqa: E741, F841.
Black and Ruff —> # fmt: on, # fmt: off.
Mypy —> # type: ignore, type: ignore[error-code].
Coverage —> # pragma: no cover, # pragma: no branch.
Isort —> # isort: skip, # isort: off.
Bandit —> # nosec.

But you know what? There is no single standard for such comments.

The internal implementation of reading such comments is also different. Someone uses regular expressions, someone uses even more primitive string processing tools, and someone uses full-fledged parsers, including the Python parser or even written from scratch.

As a result, as a user, you need to remember the rules by which comments are written for each specific tool. And at the same time, you can't be sure that things like double comments (when you want to leave 2 comments for different tools in one line of code) will work in principle. And as the creator of such tools, you are faced with a seemingly simple task — just to read a comment — and find out for yourself that it suddenly turns out to be quite difficult, and there are many possible mistakes.

This is exactly the problem that this library solves. It describes a simple and intuitive standard for action comments, and also offers a ready-made parser that creators of other tools can use. The standard offered by this library is based entirely on a subset of the Python syntax and can be easily reimplemented even if you do not want to use this library directly.

So, this library offers a language for action comments. Its syntax is a subset of Python syntax, but without Python semantics, as full-fledged execution does not occur. The purpose of the language is simply to provide the developer with the content of the comment in a convenient way, if it is written in a compatible format. If the comment format is not compatible with the parser, it is ignored.

From the point of view of the language, any meaningful comment can consist of 3 elements:

Key. This is usually the name of the specific tool for which this comment is intended, but in some cases it may be something else. This can be any string allowed as an identifier in Python.
Action. The short name of the action that you want to link to this line. Also, only the allowed Python identifier.
List of arguments. These are often some kind of identifiers of specific linting rules or other arguments associated with this action. The list of possible data types described below.

Consider a comment designed to ignore a specific mypy rule:

# type: ignore[error-code]
└-key-┘└action┴-arguments┘

↑ The key here is the word type, that is, what you see before the colon. The action is the ignore word, that is, what comes before the square brackets, but after the colon. Finally, the list of arguments is what is in square brackets, in this case, there is only one argument in it: error-code.

Simplified writing is also possible, without a list of arguments:

# type: ignore
└-key-┘└action┘

↑ In this case, the parser assumes that there is an argument list, but it is empty.

The number of arguments in the list is unlimited, they can be separated by commas. Here are the valid data types for arguments:

Valid Python identifiers. They are interpreted as strings.
Two valid Python identifiers, separated by the - symbol, like this: error-code. There can also be any number of spaces between them, they will be ignored. Interpreted as a single string.
String literals.
Numeric literals (int, float, complex).
Boolean literals (True and False).
None.
... (ellipsis).
Any other Python-compatible code. This is disabled by default, but you can force the mode of reading such code and get descriptions for any inserts of such code in the form of AST objects, after which you can somehow process it yourself.

The syntax of all these data types is completely similar to the Python original (except that you can't use multi-line writing options). Over time, it is possible to extend the possible syntax of metacode, but this template will always be supported.

There can be several comments in the metacode format. In this case, they should be interspersed with the # symbol, as if each subsequent comment is a comment on the previous one. You can also add regular text comments, they will just be ignored by the parser if they are not in metacode format:

# type: ignore # <- This is a comment for mypy! # fmt: off # <- And this is a comment for Ruff!

If you scroll through this text above to the examples of action comments from various tools, you may notice that the syntax of most of them (but not all) is it can be described using metacode, and if not, it can be easily adapted to metacode. Read on to learn how to use a ready-made parser in practice.

Install it:

You can also quickly try out this and other packages without having to install using instld.

The parser offered by this library is just one function that is imported like this:

from metacode import parse

To use it, you need to extract the text of the comment in some third-party way (preferably, but not necessarily, without the # symbol at the beginning) and pass it, and the expected key must also be passed as the second argument. As a result, you will receive a list of the contents of all the comments that were parsed:

print(parse('type: ignore[error-code]', 'type'))
#> [ParsedComment(key='type', command='ignore', arguments=['error-code'])]
print(parse('type: ignore[error-code] # type: not_ignore[another-error]', 'type'))
#> [ParsedComment(key='type', command='ignore', arguments=['error-code']), ParsedComment(key='type', command='not_ignore', arguments=['another-error'])]

As you can see, the parse() function returns a list of ParsedComment objects. Here are the fields of this type's objects and their expected types:

key: str 
command: str
arguments: List[Optional[Union[str, int, float, complex, bool, EllipsisType, AST]]]

↑ Please note that you are transmitting a key, which means that the result is returned filtered by this key. This way you can read only those comments that relate to your tool, ignoring the rest.

By default, an argument in a comment must be of one of the strictly allowed types. However, you can enable reading of arbitrary other types, in which case they will be transmitted in the AST node format. To do this, pass allow_ast=True:

print(parse('key: action[a + b]', 'key', allow_ast=True))
#> [ParsedComment(key='key', command='action', arguments=[<ast.BinOp object at 0x102e44eb0>])]

↑ If you do not pass allow_ast=True, a metacode.errors.UnknownArgumentTypeError exception will be raised. When processing an argument, you can also raise this exception for an AST node of a format that your tool does not expect.

⚠️ Be careful when writing code that analyzes the AST. Different versions of the Python interpreter can generate different AST based on the same code, so don't forget to test your code (for example, using matrix or tox) well. Otherwise, it is better to use standard metacode argument types.

You can allow your users to write keys in any case. To do this, pass ignore_case=True:

print(parse('KEY: action', 'key', ignore_case=True))
#> [ParsedComment(key='KEY', command='action', arguments=[])]

You can also easily add support for several different keys. To do this, pass a list of keys instead of one key:

print(parse('key: action # other_key: other_action', ['key', 'other_key']))
#> [ParsedComment(key='key', command='action', arguments=[]), ParsedComment(key='other_key', command='other_action', arguments=[])]

Well, now we can read the comments. But what if we want to record? There is another function for this: insert():

from metacode import insert, ParsedComment

You send the comment you want to insert there, as well as the current comment (empty if there is no comment, or starting with # if there is), and you get a ready-made new comment text:

print(insert(ParsedComment(key='key', command='command', arguments=['lol', 'lol-kek']), ''))
# key: command[lol, 'lol-kek']
print(insert(ParsedComment(key='key', command='command', arguments=['lol', 'lol-kek']), '# some existing text'))
# key: command[lol, 'lol-kek'] # some existing text

As you can see, our comment is inserted before the existing comment. However, you can do the opposite:

print(insert(ParsedComment(key='key', command='command', arguments=['lol', 'lol-kek']), '# some existing text', at_end=True))
# some existing text # key: command[lol, 'lol-kek']

⚠️ Be careful: AST nodes can be read, but cannot be written.

What about other languages?

If you are writing your Python-related tool not in Python, as is currently fashionable, but in some other language, such as Rust, you may want to adhere to the metacode standard for machine-readable comments, however, you cannot directly use the ready-made parser described above. What to do?

The proposed metacode language is a syntactic subset of Python. The original metacode parser allows you to read arbitrary arguments written in Python as AST nodes. The rules for such parsing are determined by the specific version of the interpreter that metacode runs under, and they cannot be strictly standardized, since Python syntax is gradually evolving in an unpredictable direction. However, you can use a "safe" subset of the valid syntax by implementing your parser based on this EBNF grammar:

line ::= element { "#" element }
element ::= statement | ignored_content
statement ::= key ":" action [ "[" arguments "]" ]
ignored_content ::= ? any sequence of characters excluding "#" ?

key ::= identifier
action ::= identifier { "-" identifier }
arguments ::= argument { "," argument }

argument ::= hyphenated_identifier 
           | identifier 
           | string_literal 
           | complex_literal 
           | number_literal 
           | "True" | "False" | "None" | "..."

hyphenated_identifier ::= identifier "-" identifier
identifier ::= ? python-style identifier ?
string_literal ::= ? python-style string ?
number_literal ::= ? python-style number ?
complex_literal ::= ? python-style complex number ?

If you suddenly implement your ready-made open-source parser of this grammar in a language other than Python, please let me know. This information can be added to this text.

元代码：Python机器可读注释的新标准 Metacode: The new standard for machine-readable comments for Python

What about other languages?

元代码：Python机器可读注释的新标准
Metacode: The new standard for machine-readable comments for Python