Greppability 是一个被低估的代码指标

Greppability 是一个被低估的代码指标
Greppability is an underrated code metric

原始链接: https://morizbuesing.com/blog/greppability-code-metric/

维护不熟悉的代码库通常需要搜索特定的标识符，例如函数名称、变量名称和表名称。拆分标识符或动态构建标识符可能会导致在代码库中查找相关信息变得困难。相反，在整个项目中使用一致的标识符名称并避免动态构造。此外，与深层嵌套的结构相比，平面文件夹和对象结构更易于导航。在函数命名方面，总是直接返回对象，而不是使用中间变量。处理翻译时，将翻译键组织在平面层次结构中，而不是嵌套层次结构中。最后，以平面结构组织组件可以通过单个名称空间搜索更简单地进行识别。

查找代码符号确实不是一件小事，尤其是在大型代码库中。如前所述，WebStorm、Visual Studio 和 Sublime Text 等 IDE 为导航到符号（函数、类和常量）以及查看其文档、用法和源代码提供了出色的支持。然而，遇到缺乏适当文档、结构不良或需要同时访问多个文件的代码库的情况并不罕见，而高效而强大的搜索功能证明是有益的。在大型代码库中执行 grep 操作时，正则表达式对于过滤噪音非常有帮助。像“ripgrep”这样的工具允许根据用户定义的标准进行不区分大小写的多文件搜索和突出显示匹配，从而提供快速有效的方法来定位相关符号或代码片段。毫无疑问，IDE 除了原始 grep 功能之外还提供了许多便利，但拥有像“ripgrep”这样可靠且灵活的搜索工具供您使用可以极大地提高开发人员的效率。关于编程语言风格指南及其对 grepability 的影响，这在很大程度上仍然是个人喜好的问题。我个人喜欢促进一致性和清晰度的风格指南，减少审查或与他人代码一起工作时的认知开销。通过优先考虑 grepability，可以确保可以有效地搜索代码，无论是孤立的还是作为更大生态系统的一部分。最后，必须承认没有任何一种工具或方法能够提供适用于所有情况的灵丹妙药。开发人员必须在可读性、可维护性和性能之间取得平衡，目标是营造一个鼓励协作的环境，让新手能够快速掌握复杂代码库的整体意图。

原文

When I’m working on maintaining an unfamiliar codebase, I will spend a lot of time grepping the code base for strings. Even in projects exclusively written by myself, I have to search a lot: function names, error messages, class names, that kind of thing. If I can’t find what I’m looking for, it’ll be frustrating in the best case, or in the worst case lead to dangerous situations where I’ll assume a thing is not needed anymore, since I can’t find any references to it in the code base. From these situations, I’ve derived some rules you can apply to keep your code base greppable:

Don’t split up identifiers

It turns out that splitting up, or dynamically constructing identifiers is a bad idea.

Suppose you have two database tables shipping_addresses, billing_addresses, it might seem like a perfectly good solution to construct the table name dynamically from the order type.

const getTableName = (addressType: 'shipping' | 'billing') => {
    return `${addressType}_addresses`
}

Though it looks nice and DRY, it’s not great for maintainenance: someone will inevitably search the code base for the table name shipping_addresses and miss this occurence.

Refactored for greppability:

const getTableName = (addressType: 'shipping' | 'billing') => {
    if (addressType === 'shipping') {
        return 'shipping_addresses'
    }
    if (addressType === 'billing') {
        return 'billing_addresses'
    }
    throw new TypeError('addressType must be billing or shipping')
}

The same goes for column names, object fields, and, god forbid, method/function names (it’s easily possible to dynamically construct method names with javascript).

Use the same names for things across the stack

Don’t rename fields at application boundaries to match naming schemes. An obvious example is then importing postgres-style snake_case identifiers into javascript, then converting them to camelCase. This makes it harder to find—you now have to grep for two strings instead of one in order to find all occurences!

const getAddress = async (id: string) => {
    const address = await getAddressById(id)
    return {
        streetName: address.street_name,
        zipCode: address.zip_code,
    }
}

You’re better off biting the bullet and returning the object directly:

const getAddress = async (id: string) => {
    return await getAddressById(id)
}

Flat is better than nested

Taking inspiration from the Zen of Python, when dealing with namespaces, flattening your folders/object structures is mostly better than nesting.

For example if you have two choices to set up your translation files:

{
    "auth": {
        "login": {
            "title": "Login",
            "emailLabel": "Email",
            "passwordLabel": "Password",
        },
        "register":
            "title": "Register",
            "emailLabel": "Email",
            "passwordLabel": "Password",
        }
    }
}

and

{
    "auth.login.title": "Login",
    "auth.login.emailLabel": "Email",
    "auth.login.passwordLabel": "Password",
    "auth.register.title": "Login",
    "auth.register.emailLabel": "Email",
    "auth.register.passwordLabel": "Password",
}

take the second option! You will be able to easily find your keys now, which you are probably referring to as something like t('auth.login.title').

Or consider React component structure: a component stucture like

./components/AttributeFilterCombobox.tsx
./components/AttributeFilterDialog.tsx
./components/AttributeFilterRating.tsx
./components/AttributeFilterSelect.tsx

is preferable to

./components/attribute/filter/Combobox.tsx
./components/attribute/filter/Dialog.tsx
./components/attribute/filter/Rating.tsx
./components/attribute/filter/Select.tsx

from a greppability perspective, since you’ll be able to grep for the whole namespaced component AttributeFilterCombobox just from the usage, as opposed to just Dialog, which you might have multiple of accross your application.