显示HN:类似JQ的工具
Show HN: Jq-Like Tool for Markdown

原始链接: https://github.com/yshavit/mdq

“ MDQ”是一种命令行工具,旨在查询和从Markdown文档中提取特定元素,类似于`jq'与JSON的工作方式。它简化了任务,例如验证拉动请求中的清单或从表中提取数据,用更直观的过滤器语法代替复杂的正则表达式,该语法反映了Markdown本身。 您可以根据内容选择部分,列表,任务(已完成或未完成),链接,图像,代码块,甚至基于内容的特定表行和列。 “ MDQ”支持对案例不敏感的字符串匹配,正则表达式和引用字符串。它可用于检查是否存在特定元素(例如确认用户搜索现有问题)或提取数据(例如PR Defictions中的票务URL)。 通过“货物安装”安装,下载预构建的二进制文件,然后将其与管道一起使用将其链过滤器一起链条。例如,您可以提取包含特定部分或过滤器表的特定URL的链接,以查找匹配某些模式的行或列。

有几次我希望能够从Markdown Doc中选择一些文本。例如,GitHub CI检查以确保PRS / essess / etc的格式正确。这可以在某种程度上使用Regex进行,但是这些表达式却很脆弱,稍后很难阅读或编辑。 MDQ使用熟悉的管道语法以结构化的方式导航降价。它在0.x中,因为我不想完全提交语法稳定,以防真实世界测试表明该语法需要调整。但是我认为该项目总体上是一个不错的位置,并且对反馈感兴趣!

原文

Code Coverage Build status Pending TODOs Ignored tests

mdq aims to do for Markdown what jq does for JSON: provide an easy way to zero in on specific parts of a document.

For example, GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug. Instead, you can (for example) ask mdq for all uncompleted tasks:

mdq is available under the Apache 2.0 or MIT licenses, at your option. I am open to other permissive licenses, if you have one you prefer.

Any of these will work:

  1. cargo install --git https://github.com/yshavit/mdq
  2. Download binaries from the latest release (or any other release, of course).
  3. You can also grab the binaries from the latest build-release workflow run. You must be logged into GitHub to do that (this is GitHub's limitation, not mine). You'll have to chmod +x them before you can run them.
Security concerns The release and latest-workflow binaries are built on GitHub's servers, so if you trust my code (and dependencies), and you trust GitHub, you can trust the binaries. See https://github.com/yshavit/mdq/wiki/Release-binaries for information on how to verify them.

Simple example to select sections containing "usage":

cat example.md | mdq '# usage'

Use pipe (|) to chain filters together. For example, to select sections containing "usage", and within those find all unordered list items:

cat example.md | mdq '# usage | -'

The filter syntax is designed to mirror Markdown syntax. You can select...

Element Syntax
Sections # title text
Lists - unordered list item text
" 1. ordered list item text
" - [ ] uncompleted task
" - [x] completed task
" - [?] any task
Links [display text](url)
Images ![alt text](url)
Block quotes > block quote text
Code blocks ```language <code block text>
Raw HTML </> html_tag
Plain paragraphs P: paragraph text
Tables :-: header text :-: row text

(Tables selection differs from other selections in that you can actually select only certain headers and rows, such that the resulting element is of a different shape than the original. See the example below, or the wiki for more detail.)

In any of the above, the text may be:

  • an unquoted string that starts with a letter; this is case-insensitive
  • a "quoted string" (either single or double quotes); this is case-sensitive
  • a string (quoted or unquoted) anchored by ^ or $ (for start and end of string, respectively)
  • a /regex/
  • omitted or *, to mean "any"

See the tutorial for a bit more detail, and user manual for the full picture.

Ensuring that people have searched existing issues before submitting a bug report

Many projects have bug report templates that ask the submitter to attest that they've checked existing issues for possible duplicates. In mdq, you can do:

if echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues' ; then
  ...

(The -q option is like grep's: it doesn't output anything to stdout, but exits 0 if any items were found, or non-0 otherwise.)

This will match:

... but will fail if the checkbox is unchecked:

Extracting a referenced ticket

Some organizations use GitHub Actions to update their ticket tracker, if a PR mentions a ticket. You can use mdq to extract the link from Markdown as JSON, and then use jq to get the URL:

TICKET_URL="$(echo "$PR_TEXT"
  | mdq --output json '# Ticket | [](^https://tickets.example.com/[A-Z]+-\d+$)'
  | jq -r '.items[].link.url')"

This will match Markdown like:

https://tickets.example.com/PROJ-1234

Whittling down a big table

Let's say you have a table whose columns reference people in an on-call schedule, rows correspond to weeks in YYYY-MM-DD format:

On-Call Alice Bob Sam Pat
2024-01-08 x
2024-01-15 x
2024-01-22 x

To find out when Alice is on call:

cat oncall.md | mdq ':-: /On-Call|Alice/:-: *'
|  On-Call   | Alice |
|:----------:|:-----:|
| 2024-01-08 |   x   |
| 2024-01-15 |       |
| 2024-01-22 |       |

Or, to find out who's on call for the week of Jan 15:

cat oncall.md | mdq ':-: * :-: 2024-01-15'
|  On-Call   | Alice | Bob | Sam | Pat |
|:----------:|:-----:|:---:|:---:|----:|
| 2024-01-15 |       |     |  x  |     |
联系我们 contact @ memedata.com