快速基于三元组的代码搜索
Fast trigram based code search

原始链接: https://github.com/sourcegraph/zoekt

## Zoekt:快速源码搜索 Zoekt 是一款专为源码设计的强大文本搜索引擎,最初于 2017 年从 Google 的实现中分叉而来。它擅长在单个或多个代码仓库中进行快速的子字符串和正则表达式匹配,并支持基于布尔逻辑的查询语言。 Zoekt 会根据代码特定的因素(如符号匹配)智能地对结果进行排序。其设计利用三元索引和语法解析,使其在各种编程语言中都具有通用性。 用户可以通过两种主要方式使用 Zoekt:通过命令行工具进行本地索引和搜索,或通过索引/Web 服务器设置进行更大规模的远程仓库索引(例如 GitHub 组织),并通过 Web UI 或 API 进行访问。 安装通常涉及 Go 工具,以及可选的 Universal ctags,以提高排名效果。提供详细的文档和配置示例,可以启用诸如自动仓库同步以及流式结果和替代评分方法等高级搜索选项。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 基于三元组的快速代码搜索 (github.com/sourcegraph) 6 分,由 cv_h 1小时前发布 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文
"Zoekt, en gij zult spinazie eten" - Jan Eertink

("seek, and ye shall eat spinach" - My primary school teacher)

Zoekt is a text search engine intended for use with source code. (Pronunciation: roughly as you would pronounce "zooked" in English)

Note: This has been the maintained source for Zoekt since 2017, when it was forked from the original repository github.com/google/zoekt.

Zoekt supports fast substring and regexp matching on source code, with a rich query language that includes boolean operators (and, or, not). It can search individual repositories, and search across many repositories in a large codebase. Zoekt ranks search results using a combination of code-related signals like whether the match is on a symbol. Because of its general design based on trigram indexing and syntactic parsing, it works well for a variety of programming languages.

The two main ways to use the project are

  • Through individual commands, to index repositories and perform searches through Zoekt's query language
  • Or, through the indexserver and webserver, which support syncing repositories from a code host and searching them through a web UI or API

For more details on Zoekt's design, see the docs directory.

go get github.com/sourcegraph/zoekt/

Note: It is also recommended to install Universal ctags, as symbol information is a key signal in ranking search results. See ctags.md for more information.

Zoekt supports indexing and searching repositories on the command line. This is most helpful for simple local usage, or for testing and development.

Indexing a local git repo

go install github.com/sourcegraph/zoekt/cmd/zoekt-git-index
$GOPATH/bin/zoekt-git-index -index ~/.zoekt /path/to/repo

Indexing a local directory (not git-specific)

go install github.com/sourcegraph/zoekt/cmd/zoekt-index
$GOPATH/bin/zoekt-index -index ~/.zoekt /path/to/repo
go install github.com/sourcegraph/zoekt/cmd/zoekt
$GOPATH/bin/zoekt 'hello'
$GOPATH/bin/zoekt 'hello file:README'

Zoekt also contains an index server and web server to support larger-scale indexing and searching of remote repositories. The index server can be configured to periodically fetch and reindex repositories from a code host. The webserver can be configured to serve search results through a web UI or API.

Indexing a GitHub organization

go install github.com/sourcegraph/zoekt/cmd/zoekt-indexserver

echo YOUR_GITHUB_TOKEN_HERE > token.txt
echo '[{"GitHubOrg": "apache", "CredentialPath": "token.txt"}]' > config.json

$GOPATH/bin/zoekt-indexserver -mirror_config config.json -data_dir ~/.zoekt/ 

This will fetch all repos under 'github.com/apache', then index the repositories. The indexserver takes care of periodically fetching and indexing new data, and cleaning up logfiles. See config.go for more details on this configuration.

go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver
$GOPATH/bin/zoekt-webserver -index ~/.zoekt/

This will start a web server with a simple search UI at http://localhost:6070. See the query syntax docs for more details on the query language.

If you start the web server with -rpc, it exposes a simple JSON search API at http://localhost:6070/api/search.

The JSON API supports advanced features including:

  • Streaming search results (using the FlushWallTime option)
  • Alternative BM25 scoring (using the UseBM25Scoring option)
  • Context lines around matches (using the NumContextLines option)

Finally, the web server exposes a gRPC API that supports structured query objects and advanced search options.

Thanks to Han-Wen Nienhuys for creating Zoekt. Thanks to Alexander Neubeck for coming up with this idea, and helping Han-Wen Nienhuys flesh it out.

联系我们 contact @ memedata.com