显示HN：类似JQ的工具

显示HN：类似JQ的工具
Show HN: Jq-Like Tool for Markdown

“ MDQ”是一种命令行工具，旨在查询和从Markdown文档中提取特定元素，类似于`jq'与JSON的工作方式。它简化了任务，例如验证拉动请求中的清单或从表中提取数据，用更直观的过滤器语法代替复杂的正则表达式，该语法反映了Markdown本身。您可以根据内容选择部分，列表，任务（已完成或未完成），链接，图像，代码块，甚至基于内容的特定表行和列。 “ MDQ”支持对案例不敏感的字符串匹配，正则表达式和引用字符串。它可用于检查是否存在特定元素（例如确认用户搜索现有问题）或提取数据（例如PR Defictions中的票务URL）。通过“货物安装”安装，下载预构建的二进制文件，然后将其与管道一起使用将其链过滤器一起链条。例如，您可以提取包含特定部分或过滤器表的特定URL的链接，以查找匹配某些模式的行或列。

有几次我希望能够从Markdown Doc中选择一些文本。例如，GitHub CI检查以确保PRS / essess / etc的格式正确。这可以在某种程度上使用Regex进行，但是这些表达式却很脆弱，稍后很难阅读或编辑。 MDQ使用熟悉的管道语法以结构化的方式导航降价。它在0.x中，因为我不想完全提交语法稳定，以防真实世界测试表明该语法需要调整。但是我认为该项目总体上是一个不错的位置，并且对反馈感兴趣！

（评论） 2024-09-04

Show HN：Typeform 替代方案，将 Markdown 转换为表单 2024-09-02

如果值得保留，请将其保存在Markdown中 2025-02-27

我将 serde_json 字符串的速度加快了 20% 2024-08-25

原文

mdq aims to do for Markdown what jq does for JSON: provide an easy way to zero in on specific parts of a document.

For example, GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug. Instead, you can (for example) ask mdq for all uncompleted tasks:

mdq is available under the Apache 2.0 or MIT licenses, at your option. I am open to other permissive licenses, if you have one you prefer.

Any of these will work:

cargo install --git https://github.com/yshavit/mdq

Download binaries from the latest release (or any other release, of course).
You can also grab the binaries from the latest build-release workflow run. You must be logged into GitHub to do that (this is GitHub's limitation, not mine). You'll have to chmod +x them before you can run them.

Security concerns

The release and latest-workflow binaries are built on GitHub's servers, so if you trust my code (and dependencies), and you trust GitHub, you can trust the binaries. See https://github.com/yshavit/mdq/wiki/Release-binaries for information on how to verify them.

Simple example to select sections containing "usage":

cat example.md | mdq '# usage'

Use pipe (|) to chain filters together. For example, to select sections containing "usage", and within those find all unordered list items:

cat example.md | mdq '# usage | -'

The filter syntax is designed to mirror Markdown syntax. You can select...

Element	Syntax
Sections	`# title text`
Lists	`- unordered list item text`
"	`1. ordered list item text`
"	`- [ ] uncompleted task`
"	`- [x] completed task`
"	`- [?] any task`
Links	`[display text](url)`
Images	`![alt text](url)`
Block quotes	`> block quote text`
Code blocks	```language <code block text>
Raw HTML	`</> html_tag`
Plain paragraphs	`P: paragraph text`
Tables	`:-: header text :-: row text`

(Tables selection differs from other selections in that you can actually select only certain headers and rows, such that the resulting element is of a different shape than the original. See the example below, or the wiki for more detail.)

In any of the above, the text may be:

an unquoted string that starts with a letter; this is case-insensitive
a "quoted string" (either single or double quotes); this is case-sensitive
a string (quoted or unquoted) anchored by ^ or $ (for start and end of string, respectively)
a /regex/
omitted or *, to mean "any"

See the tutorial for a bit more detail, and user manual for the full picture.

Ensuring that people have searched existing issues before submitting a bug report

Many projects have bug report templates that ask the submitter to attest that they've checked existing issues for possible duplicates. In mdq, you can do:

if echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues' ; then
  ...

(The -q option is like grep's: it doesn't output anything to stdout, but exits 0 if any items were found, or non-0 otherwise.)

This will match:

... but will fail if the checkbox is unchecked:

Extracting a referenced ticket

Some organizations use GitHub Actions to update their ticket tracker, if a PR mentions a ticket. You can use mdq to extract the link from Markdown as JSON, and then use jq to get the URL:

TICKET_URL="$(echo "$PR_TEXT"
  | mdq --output json '# Ticket | [](^https://tickets.example.com/[A-Z]+-\d+$)'
  | jq -r '.items[].link.url')"

This will match Markdown like:

https://tickets.example.com/PROJ-1234

Whittling down a big table

Let's say you have a table whose columns reference people in an on-call schedule, rows correspond to weeks in YYYY-MM-DD format:

On-Call Alice Bob Sam Pat

2024-01-08 x

2024-01-15 x

2024-01-22 x

To find out when Alice is on call:

cat oncall.md | mdq ':-: /On-Call|Alice/:-: *'

|  On-Call   | Alice |
|:----------:|:-----:|
| 2024-01-08 |   x   |
| 2024-01-15 |       |
| 2024-01-22 |       |

Or, to find out who's on call for the week of Jan 15:

cat oncall.md | mdq ':-: * :-: 2024-01-15'

|  On-Call   | Alice | Bob | Sam | Pat |
|:----------:|:-----:|:---:|:---:|----:|
| 2024-01-15 |       |     |  x  |     |