（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41430772

查找代码符号确实不是一件小事，尤其是在大型代码库中。如前所述，WebStorm、Visual Studio 和 Sublime Text 等 IDE 为导航到符号（函数、类和常量）以及查看其文档、用法和源代码提供了出色的支持。然而，遇到缺乏适当文档、结构不良或需要同时访问多个文件的代码库的情况并不罕见，而高效而强大的搜索功能证明是有益的。在大型代码库中执行 grep 操作时，正则表达式对于过滤噪音非常有帮助。像“ripgrep”这样的工具允许根据用户定义的标准进行不区分大小写的多文件搜索和突出显示匹配，从而提供快速有效的方法来定位相关符号或代码片段。毫无疑问，IDE 除了原始 grep 功能之外还提供了许多便利，但拥有像“ripgrep”这样可靠且灵活的搜索工具供您使用可以极大地提高开发人员的效率。关于编程语言风格指南及其对 grepability 的影响，这在很大程度上仍然是个人喜好的问题。我个人喜欢促进一致性和清晰度的风格指南，减少审查或与他人代码一起工作时的认知开销。通过优先考虑 grepability，可以确保可以有效地搜索代码，无论是孤立的还是作为更大生态系统的一部分。最后，必须承认没有任何一种工具或方法能够提供适用于所有情况的灵丹妙药。开发人员必须在可读性、可维护性和性能之间取得平衡，目标是营造一个鼓励协作的环境，让新手能够快速掌握复杂代码库的整体意图。

Grepping for symbols like function names and class names feels so anemic compared to using a tool that has a syntactic understanding of the code. Just "go to definition" and "find usages" alone reduce the need for text search enormously.

For the past decade-plus I have mostly only searched for user facing strings. Those have the advantage of being longer, so are more easily searched.

Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

Scenarios where an IDE with full syntactic understanding is better:

- It's your day to day project and you expect to be working in it for a long time.

Scenarios where grepping is more useful:

- Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

- You just opened the project for the first time.

- It's in a language you don't daily drive (you write backend but have to delve in frontend code, it's a 3rd party library, it's configuration files, random json/xml files or data)

- You're editing or searching through documentation.

- You haven't even downloaded the project and are checking things out in github (or some similar site for your project).

- You're providing remote assistance to someone and you are not at your main development machine.

- You're remoting via SSH and have access to code there (say it's a python server).

Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

Further important (to me) scenarios that also argue for greppability:

- greppability does not preclude IDE or language server tooling; there's often special cases where only certain e.g. context-dependant usages matter, and sometimes grep is the easiest way to find those.

- projects that include multiple languages, such as for instance the fairly common setup of HTML, JS, CSS, SQL, and some server-side language.

- performance in scenarios with huge amounts of code, or where you're searching very often (e.g. in each git commit for some amount of history)

- ease of use across repositories (e.g. a client app, a spec, and a server app in separate repos).

I treat greppability as an almost universal default. I'd much rather have code in a "weird" naming style in some language but have consistent identifiers across languages, than have normal-style-guide default identifiers in each language, but differing identifiers across languages. If code "looks weird", if anything that's often actually a _benefit_ in such cases, not a downside - most serialization libraries I use for this kind of stuff tend to do a lot of automagic mapping that can break in ways that are sometimes hard to detect at compile time if somebody renames something, or sometimes even just for a casing change or type change. Having a hint as to this fragility immediate at a glance even in dynamically typed languages is sometimes a nice side-effect. Very speculatively, I wouldn't be surprised if AI coding tools can deal with consistent names better than context-dependent ones too; greppability is likely not specifically about merely the tool grep.

And the best part is that there's almost no downside; it's not like you need to pick either a language server, IDE or grep - just use whatever is most convenient for each task.

Grep is also useful when IDE indexing isn't feasible for the entire project. At past employers I worked in monorepos where the sheer size of the index caused multiple seconds of delay in intellisense and UI stuttering; our devex team's preferred approach was to better integrate our IDE experience with the build system such that only symbols in scope of the module you were working on would be loaded. This was usually fine, and it works especially well for product teams, but it's a headache when you're doing cross-cutting work (e.g. for infrastructure projects/overhauls).

We also had a livegrep instance that we could use to grep any corporate repo, regardless of where it was hosted. That was extremely useful for investigating failures in build scripts that spanned multiple repositories (e.g. building a Go sidecar that relies on a service config in the Java monorepo).

As someone who runs into that daily, I'm surprised I never heard of this before.

I seem to have found the 64-bit mode under "Tools > Options" then "Text Editor > C/C++ > IntelliSense". The top option is [] Enable 64-bit IntelliSense.

But I can't seem to find the ram limit you mentioned and searching for it just keeps bringing up stuff related to vscode. Do you know where it is off the top of your head or a page that might describe it?

The RAM limit is 32 bit Intellisense. 2^32 is 4GiB.

Edit: I take that back, this was a first-principles comment. There's a setting 'C_Cpp: Intelli Sense Memory Limit' (space included).

Thanks for that, while searching google for that result only lead me to vscode's IntelliSense settings. Searching for "Intelli Sense Memory Limit" setting in visual studio didn't lead me right to the result but it did give me a whole settings page that "matched". I found the setting in visual studio is "IntelliSense Process Memory Limit" which is under "Text Editor > C/C++ > Advanced" then under header "IntelliSense" towards the bottom of the section.

> It's your day to day project and you expect to be working in it for a long time.

I don't think we need to restrict the benefits quite that much—if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.

Going further, I'd equally rather have plugins ready to go for every language my company works in and use them for exploring a foreign codebase. The navigation tools all work more or less the same, so it's not like I need to invest effort learning a new tool in order to benefit from navigation.

> Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

Certainly don't sabotage, but some of these suggestions are bad for other reasons that aren't about grep.

For example: breaking the naming conventions of your language in order to avoid remapping is questionable at best. Operating like that binds your business logic way too tightly to the database representation, and while "just return the db object" sounds like a good optimization in theory, I've never not regretted having frontend code that assumes it's operating directly on database objects.

> if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.

It's funny, because my preference and actual use is the exact opposite: for a project that isn't my day-to-day, I'm much more likely to try to grep through it rather than open it in an IDE.

> if it's a project that isn't my day-to-day

Another overlooked advantage of greppability is to be able to fuzzy the search, or discover related code that wasn't directly linked to what you were looking for.

For instance if you were hunting for the method updating a `foo_bar` instance, grepping it will also give you instances of `generic_foo_bar` and `shim_foo_bar`. It can be noise, as it can be stuff you wouldn't have seen otherwise and save your bacon. If you're not familiar with a project I think it's quite an advantage.

> hope that the developers made it grepable

hopefully it's enforced at an organization level.

> - Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

LSP-based tools are fine with this, generally. A syntactic understanding is an incomplete solution. I suspect GP meant LSP. (as long as compile_commands.json or equivalent is avilable).

Many of those other caveats are non-issues once LSPs are widespread. Even Github has lsp-like go-to-def/go-to-ref, though it's not perfect.

I used to pipe things through black for that. (a script that imported black, not just black on the command line.)

I also had `j2p` and `p2j` that would convert between python (formatted via black) and json (formatted via jq), and the `j2p_clip`/`p2j_clip` versions that would pipe from clipboard and back into clipboards.

It's worth taking the time to build a few simple scripts for things you do a lot. I used to open up the repl and import json to convert between json and python dicts multiple times a day, so spending a few minutes throwing together a simple script to do it was well worth the effort.

part of what i ended up with was this:

    {'country': ['25', '32', '6', '37', '72', '22', '17', '39', '14', '10',
                 '35', '43', '56', '36', '110', '11', '26', '12', '4', '5'],
     'timeZone': '8', 'dateFrom': '2024-05-01', 'dateTo': '2024-05-30',

black is the opposite extreme from what i wanted; https://black.readthedocs.io/en/stable/the_black_code_style/... explains:

> If a data structure literal (tuple, list, set, dict) or a line of “from” imports cannot fit in the allotted length, it’s always split into one element per line.

i'm not interested in minimizing diffs. i'm interested in being able to see all the fields of one record on one screen—moreover, i'd like to be able to see more than one record at a time so i can compare what's the same and what's different

black seems to be designed for the kind of person who always eats at mcdonald's when they travel because they value predictability over quality

My understanding of black is that it solves bikeshedding by making everyone a little unhappy.

For aligned column readability and other scenarios, # fmt: off and # fmt: on become crucial. The problem is that like # type: ignore, those start spreading if you're not careful.

My only complaint with black is that it only splits long definitions into per-line if they exceed a limit. That’s probably configurable, now that I write it down.

Other than that, I actually quite like its formatting choices.

i kind of get the vibe from the black documentation that it's written by the kind of person who thinks we're bad people for wanting that, and perhaps that everyone should wear the same uniform because vanity is sinful and aesthetics are frivolous

I honestly suspect that the amount of time spent dealing with the issues monorepos cause is net-larger than the gains most get from what a monorepo offers. It's just harder to measure because it tends to degrade slowly, happen to things you didn't realize you were relying on (until you need them), and without clear ways to point fingers at the cause.

Plus it means your engs don't learn how to deal with open source code concerns, e.g. libraries, forking, dependency management. Which gradually screws over the whole ecosystem.

If you're willing to put Google-scale effort into building your tooling, sure. Every problem is solvable. Only Google does that though, everyone else is getting by with a tiny fraction of the resources and doesn't already have a solid foundation to reduce those maintenance costs.

Sure. But those are far from the only massive codebases out there, and many of the biggest are monorepos because sorta by definition they are the size of multiple projects.

clangd works fine for me with the linux kernel. For best results build the kernel with clang by setting LLVM=1 and KERNEL_LLVM=1 in the build environment and run ./scripts/clang-tools/gen_compile_commands.py after building.

>It's your day to day project and you expect to be working in it for a long time.

Bold of everyone here to assume that everyone has a day to day project. If you're a consultant or for other reasons you're switching projects on a month to month basis, greppability is probably the top metric second to UT coverage.

They said the scenario in which that would be useful was IF: "It's your day to day project and you expect to be working in it for a long time". The implication being that if neither of those hold then skip to the next section.

I don't think anyone is assuming anything here. I've contracted for most of my career and this didn't seem like an outlandish statement.

Also, if you're working in a project for a month, odds are you could set up an IDE in the first few hours. Not sure how any of this rises to the level of being "bold".

> Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

Your other points make sense, but in this case, at least for C/C++, you can generate a compile_commands.json that will let clangd interpret your code accurately.

If building with make just do `bear -- make` instead of `make`. If building with cmake pass `-DCMAKE_EXPORT_COMPILE_COMMANDS=1`.

The macros I see in the real world seem to usually work fine. I’m sure it’s not perfect and you can construct a macro that would confuse it, but it’s a lot better than not having a compilation db at all.

> - Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

You need a better IDE.

> - You just opened the project for the first time.

Go grab a coffee

> - It's in a language you don't daily drive

Jetbrains all products pack, baby.

> - You haven't even downloaded the project and are checking things out in github (or some similar site for your project).

On GitHub, press `.` to open it in a web-based vscode. Download it & open it in your IDE while you are doing this.

> - You're remoting via SSH and have access to code there (say it's a python server).

Don't do this. Check the git hash that was deployed and checkout the code locally.

- you just switched branch/rebased and the index is not up to date.

- the project is large enough that the IDE can't cope.

- you want to also match comments, commented out code or in-project documentation

- you want fuzzy search and match similarly named functions

I use clangd integration in my IDE all the time, but often brute force is the right solution.

I abandoned VSCode and went back to vim + ctags + ripgrep after a year with the most popular IDE. I miss some features but it didn’t give me a 10x or even 1.5x improvement in my own work along any dimension.

I attribute that mostly to my several decades of experience with vi(m) and command line tools, not to anything inherently bad about VSCode.

What counts as “better” tools has a lot of subjectivity and circumstances implied. No one set of tools works for everyone. I very often have to work over ssh on servers that don’t allow installing anything, much less Node and npm for VSCode, so I invest my time in the tools that always work everywhere, for the work I do.

The main project I’ve worked on for the last few years has a little less than 500,000 lines of code. VSCode’s LSP takes a few seconds fairly often to maintain the LSP indexes. Running ctags over the same code takes about a second and I can control when that happens. vim has no delays at all, and ripgrep can search all of the files in a second or two.

I have similar feelings... I still use IntelliJ IDEA for JVM languages, but for C, Rust, Go, Python, etc., I've been using vim for years (decades?), and that's just how I prefer to write code in those languages. I do have LSP plugins installed in vim for the languages I work in, and do have a key sequence mapped for jump-to-definition... but I still find myself (rip)grepping through the source at least as often as I j-t-d, maybe more often.

Did you consider Neovim? You get the benefit of vim while also being able to mix in as much LSP tooling as you like. The tradeoff is that it takes some time to set up, although that is getting easier.

That won’t make LSP go any faster though. There’s still something interesting in the fact that a ripgrep of every line in the codebase can still be faster than a dedicated tool.

Considered it and have tried repeatedly to get it to work with mixed success. As you wrote, it takes "some time" to set up. In my case it would only offer marginal improvements over plain vim, since I'm not that interested in the LSP integration (and vim has that too, through a plugin).

In the environments I often work in I can't install anything or run processes like node. I ssh into a server and have to use whatever came with the Linux distro, which means sticking with the tools I will find everywhere. I can't copy the code from the server either. If I get lucky they used version control. I know not everyone works with those constraints. I specialize in working on abandoned and legacy code.

Yes ok. And legacy code might be a good example where grep works well, if it's fair to argue a greater propensity for things like preprocessors, older languages and custom builds that may not play as well with semantic-level tools, let alone be written with modern tooling in mind.

Lol, I'm not working with COBOL or Fortran. Legacy code in my world means the original developers have left, not that it dates from the 1970s. Mostly I work with PHP, shell scripts, various flavors of SQL, Python, sometimes Rails or other stuff. All things modern LSPs can handle.

can you not upload executables over ssh, say for policy reasons or disk-space reasons? how about shell scripts?

i mean, i doubt i'm going to come up with some brilliant breakthrough that makes your life easier that you've somehow overlooked, but i'd like to understand what kinds of constraints people like you often confront

i'm just glad you don't have to use teamviewer

I don't have to use TeamViewer, though I very occasionally have to use Windows RDP.

You can transfer any kind of file over ssh. scp, sftp, rsync will all copy binaries. Mainly the issues come down to policy and billable time. Many of my customers simply don't allow installing anything on their servers without a tedious approval process. Even if I can install things I might spin my wheels trying to get it to work in an environment I don't have root privileges on, with no one willing to help, and I can't bill for that time. I don't work for free to get an editor installed. I use the tools I know I can find on any Linux/BSD server.

With some customers I have root privileges and manage the server for them. With others their IT dept has rules I have to follow (I freelance) if I want to keep a good relationship. Since I juggle multiple customers and environments I find it simpler not having to manage different editors and environments, so I mostly stick with the defaults. I do have a .profile and .vimrc I copy around if allowed to, that's about it.

I can't lose time/money and possibly goodwill whining about not having everything just-so for me. I recently worked on a server over ssh that didn't have tmux installed. Fortunately it did have screen, and I can use that too, no big deal. I spent less than 60 seconds figuring that out and getting to work rather than wasting hours of non-billable time annoying someone about how I needed tmux installed.

i see, thanks!

wrt rdp, i feel like rdp is actually better than vnc or x11-over-ssh, but for cases where regular ssh works, i'd rather use ssh

i wasn't thinking in terms of installing tmux, more like a self-contained binary that doesn't require any kind of 'installation'

I used the word "install" but the usual rule says I can't install, upload, or execute any non-approved software. Usually that just gets stated as a policy, but I have seen Linux home directories on noexec partitions -- government agencies and big corporations can get very strict about that. So copying a self-contained binary up and running it would violate the policy.

I pretty much live in ssh. Remote Desktop means a lot of clicking and watching a GUI visibly repaint. Not efficient. Every so often I have customers using applications that only run on Windows, no API, no command line, so they will enable RDP to that, usually through a VPN.

my cousin wrote a vt52 emulator in bash, and i was looking at a macro assembler written in bash the other day: https://github.com/jhswartz/mle-amd64/blob/master/amd64. i haven't seen a cscope written in bash, but you probably remember how the first versions of ctags were written in sh (or csh?) and ed. so there's not much limit to how far shell functions can go in augmenting your programming environment

if awk, python, or perl is accepted, the possibilities expand further

Sure, but this is taking things to a bit of an absurd extreme. If I worked in a restrictive environment where I couldn't install my own tools, I don't think I would be in a position to burn a ton of my employer's time building sophisticated development tools in bash.

(One-off small scripts for things, sure. But I'm not going to implement something like ctags or cscope or a LSP server in bash.)

certainly it's absurd! nobody would deny that. on the other hand, the problem to solve is also an absurd problem

and i wasn't suggesting trying to bill for doing it, but rather, if you were frequently in this situation, it might be reasonable to spend non-billable time between clients doing it

I guess I don’t see the problem as absurd. As a freelancer I need to focus on the problems the customer will pay for. I don’t write code for free or in my spare time anymore, I used to years ago. i feel comfortable working with the constraints imposed, I think of that as a valuable skill, not a handicap.

I looked at Helix but since I dream in vim motions at this point (vi user since it came out) I'd have to see a 10x improvement to switch. VSCode didn't give me a 10X improvement, I doubt Helix would.

Helix certainly won't give you a 10x improvement. It tends to convert a lot of people moving "up" from VS Code, and still a decent chunk, but certainly fewer neovim users moving "down".

Advantages of Helix are pretty straightforward:

1. Very little configuration bullshit to deal with. There's not even a plugin system yet! You just paste your favorite config file and language/LSP config file and you're good to go. For anything else, submit a pull request.

2. Built in LSP support for basically anything an LSP exists for.

3. There's a bit of a new generation command line IDE forming itself around zellij (tmux that doesn't suck) + helix + yazi (basically nnn or mc on crack, highly recommended).

That whole zellij+helix+yazi environment is frankly a joy to work in, and might be the 2-3x improvement over neovim that makes the switch worth it.

Like I wrote, I looked at Helix. Seems cool but not enough for me to switch. And I would have to install it on the machines I work on, which very often I can't do because of company policies, or can't waste the non-billable time on.

I only recently moved from screen to tmux, and I still have to fall back to screen sometimes because tmux doesn't come with every Linux distro. I expect I will retire before I think tmux (or screen, for that matter) "sucks" to the point I would look at something else. And again I very often can't install things on customer servers anyway.

Tmux does suck pretty bad though?

It conflicts with the clipboard and a bunch of hotkeys, and configuring it never works because they have breaking change in how their config file works ever 6months or so.

These days I only use it to launch a long running job in ssh to detach the session it's on and leave.

That’s more or less what I use it for — keeping sessions alive. I don’t use 90% of the features. vim does splits, and there’s ctrl-Z to background it and get a shell.

I know I could get more out of tmux but haven’t really needed to. I use it with the default config. I have learned from experience that the less I try to customize my environment the less non-billable time I waste trying to get that working and maintaining it.

VSCode is not an IDE, it's an extensible text editor. IDEs are integrated (it's in the name) and get developed as a whole. I'm 99% certain that if you were forced to spend a couple of months in a real IDE (like IDEA or Rider), you would not want to go back to vim, or any other text editor. Speaking as a long time user of both.

I get your point, but VSCode does far more than text editing. The line between an advanced editor and an IDE gets blurry. If you look at the Wikipedia page about IDEs[1] you see that VSCode ticks off more boxes than not. It has integration with source code control, refactoring, a debugger, etc. With the right combination of extensions it gets really close to an IDE as strictly defined. These days advanced text editor vs. "real" IDE seems more like a distinction without much of a difference.

You may feel 99% certain, but you got it wrong. I have quite a bit of experience with IDEs, you shouldn't assume I use vim out of ignorance. I have worked as a programmer for 40+ years, with development tools (integrated or not) that I have forgotten the names of. That includes "real" IDEs like Visual Studio, Metrowerks CodeWarrior, Symantec Think C, MPW, Oracle SQL Developer, Turbo Pascal, XCode, etc. and so on. When I started programming every mainframe and minicomputer came with an IDE for the platform. Unix came along with the tools broken out after I had worked for several years. In high school I learned programming on an HP-2000 BASIC minicomputer -- an IDE.

So I have spent more than "a couple of months in real IDEs" and I still use vim day to day. If I went back to C++ or C# for Windows I would use Visual Studio, but I don't do that anymore. For the kind of work I do now vim + ctags + ripgrep (and awk, sed, bash, etc.) get my work done. At my very first real job I used PWB/Unix[2] -- PWB means Programmer's Work Bench -- an IDE of sorts. I still use the same tools (on Linux) because they work and I can always count on finding a large subset of them on any server I have to work with.

I don't dislike or mean to crap on IDEs. I have used my share of IDEs and would again if the work called for that. I get what I need from the tools I've chosen, other people make different choices, no perfect language, editor, IDE, what have you exists.

[1] https://en.wikipedia.org/wiki/Integrated_development_environ...

[2] https://en.wikipedia.org/wiki/PWB/UNIX

Me too -- in Portland, Cleveland HS, mid-70s.

The HP 2000 [1] had a timeshared BASIC system that the school district made available to schools, over ASR-33 teletypes with dial-up modems. The BASIC system could edit, run (translate to byte code and execute), manage files. No version control or debuggers back then. The HP 2000 had another layer of of the operating system accessible to administrators (the A000 account if I remember right) but it was the same timeshared BASIC system with some additional commands for managing user accounts and files.

No one familiar with modern IDEs would recognize the HP 2000 BASIC system as an IDE, but it was self-contained and fully integrated around writing BASIC programs. HP also offered FORTRAN for it but not under the timeshared BASIC system. A friend wrote an assembler (in BASIC!) and taking advantage of a glitch in the bytecode interpreter we could load and run programs written in assembly language.

After high school I got a job as night computer operator with the Multnomah County ESD (school district) so I had admin access to the HP 2000, and their two HP 3000 systems, and an IBM computer they used for crunching class registrations. Good times.

Someone had an emulator online for a while, accessible over telnet, but I can't find it now.

[1] https://en.wikipedia.org/wiki/HP_Time-Shared_BASIC

i think it's very reasonable to describe time-shared basic systems like that as ides. the paradigmatic example of an 'ide' is probably turbo pascal 1.0, and of the features that separated turbo pascal from 'unintegrated' editor/compiler/assembler/linker/debugger setups, i think the dartmouth timesharing system falls mostly on the 'ide' side of the line. you could stop your program at any point and inspect its variables, change them, evaluate expressions, change the source code, and continue execution. runtime errors would also pop you into the interactive basic prompt where you could do all those things. i believe the hp 2000 timesharing basic had all these features, too

At the time, in the context of other software development environments (like submitting decks of punch cards) the HP 2000 timeshared BASIC environment would count as an IDE. Compared to Turbo Pascal or any modern IDE it falls short.

HP TSB did not have a REPL. If your program crashed or stopped you could not examine variables from the terminal. You could not peek or poke memory locations as you could with microcomputer BASICs (which didn't support multiple users, so didn't have the security concern). You had to insert PRINT statements to debug the code. TSB BASIC didn't have compile/link steps, it tokenized the code as you entered the lines, and the interpreter amounted to a big switch statements on the tokens. P. J Brown's book Writing Interactive Compilers and Interpreters (1981) describes how TSB works. Eventually I got the source code to TSB (written in assembler) and figured it out for myself.

Other BASIC implementations that popped up around the same time had richer feature sets. In my senior year at high school I got (unauthorized) access to a couple of Unix systems in Portland, ordered the Bell Labs Technical Journal issues that described Unix and C, and taught myself from those. I didn't get paid to work on a Unix system until several years later (detours into RSTS-11, TOPS-20, VMS, Microdata, Pr1me, others) but I caught the Unix and C bugs young and I still work with those tools every day.

Some programmer friends and more than a few colleagues over the years have made fun of my continued use of what they call obsolete and arcane tools. I don't mind, I have never felt like I did less or sloppier work than my co-workers, and my freelance customers don't care what I use as long as I can solve their problems. Most of the work in programming doesn't happen at the keyboard anyway. I do pay attention and experiment with all kinds of tools but I usually end up going back to the Unix tools I have long familiarity with. That said I did spend many years in Visual Studio, MPW, CodeWarrior, and MPW writing C and C++ code, and I do think those tools (well, maybe not MPW) offered a lot of benefits over coding with vim and grep, for the projects I did back then.

Maybe ironically I use an iPad Pro, I don't have a desktop or laptop anymore. So I have the most modern hardware and a touch-based (soon) AI-assisted operating system that runs a terminal emulator during my work time.

I think you're arguing semantics here in a way that's not particularly productive. VSCode can be set up in a way that is nearly as featureful as an IDE like IntelliJ IDEA or Eclipse, and the default configuration and OOB experience pushes you hard in that direction. VSCode is designed for software development, not as a general text editor; I would never open up VSCode to edit a configuration file or type up a text file of notes, for example.

Something like vim is designed as a general text-editing tool. Sure, you can load it up with plugins and scripts that give you a bunch of features you'd find in an IDE, but the experience is not the same, and the "integrated" bit of "IDE" is still just not there.

(And I say this as someone who does most of his coding in vim, with LSP plugins installed, only reaching for a "proper" IDE for Java and Scala.)

One metric I would use: if I can sit down at a random co-worker's desk and feel more or less at home in their editor of choice, then it's probably an IDE that has reasonable defaults and is geared for software development. IDEA and VSCode would qualify... vim would certainly not.

A good IDE can be so much better iff it understands the code. However this requires the IDE to be able to understand the project structure, dependencies etc. which can be considerable effort. In a codebase with many projects employing several different languages it becomes hard to get and maintain the IDE understands everything state.

And an IDE would also fail to find references for most of the cases described in the article: name composition/manipulation, naming consistency across language barriers, and flat namespaces in serialization. And file/path folder naming seems to be irrelevant to the smart IDE argument. "Naming things is hard"

And especially in large monorepos anything that understands the code can become quite sluggish. While ripgrep remains fast.

A kind of in-between I've found for some search and replace action is comby (https://comby.dev/). Having a matching braces feature is a godsend for doing some kind of replacements properly.

I think the first sentence of the author counters your comment. What you described works best in a familiar codebase where the organizing principles have been maintained well and are familiar to the reader and the tools are just the extension of those organizing principles. Even then a deviation from those rules might produce gaps in understanding of what the codebase does.

And grep cuts right through that in a pretty universal way. What the post describes are just ways to not work against grep to optimize for something ephemeral.

Go to definition and find usages only work one symbol at a time. I use both, but I still use global find/replace for groups of symbols sharing the same concept.

For example if I want to rename all “Dog” (DogModel, DogView, DogController) symbols to “Wolf”, find/replace is much better at that because it will tell me about symbols I had forgotten about.

There's no reason they have to work one symbol at a time - that's just a missing feature in your language server implementation.

Some language servers support modifying the symbols in contexts like docstrings as well.

I’ve never seen an LSP server that lets you rename “Dog” to “Wolf” where your actual class names are “Dog[A-Za-z]*”?

Do you have an example?

Neither have I; and no, I don't - I misinterpreted what you said.

But I don't see why LSP servers shouldn't support this, still. I'm not sure if the LSP specification allows for this as of current, though.

I would actually love a regexp search-and-replace assisted by either TreeSitter or LSP.

Something that lets me say that I want to replace “Dog$.*$” with “Wolf\1”, but where each substitution is performed only within single “symbols” as identified by TS or LSP.

I am familiar with the situation you describe, and it's a good point.

However, it does suggest that there is an opportunity for factoring "Dog" out in the code, at least by name spacing (e.g. Dog.Model).

That gets to the core of the issue doesn’t it? There are two cultures: Do you prefer to refactor DogView into Dog.View, or do you prefer to refactor Dog.View into DogView.

Personally I value uniqueness/canonicalness over conciseness. I would rather have DogView because then there is one name for the symbol regardless of where I am in the codebase. If the same symbol is used with differently qualified names it is confusing - I want the least qualified name to be more descriptive than “View”.

The other culture is to lean heavily on namespaces and to not worry about uniqueness. In this case you have View and Dog.View that may be used interchangeably in different files. This is the dominant culture in Java and C#.

Not everything you need to look for is a language identifier. I often grep for configuration option names in the code to see what the option actually does - sometimes it is easy to grep, sometimes there are too many matches, sometimes they cannot be found because option name composed in the code from separate unrepeatable (because of too many matches) parts. It's not hard to make config options greppable but some coders just don't care about this property.

strongly disagree here. This works if - your IDE/language server is performant - all the tools are fully set up - you know how to query the specific semantic entity you're looking for (remembering shortcuts) - you are only interested in a single specific semantic entity - mixing entities is rarely supported

I dont map out projects in terms of semantics, I map out projects in files and code - That makes querying intuitive and I can easily compose queries that match the specificity of what I care about (e.g. I might want to find a `Server` but I want to show both classes, interfaces and abstract classes).

For the specific toolchain I'm using - typescript - the symbol search is also unusable once it hits a certain project size, it's just way too slow for it to be part of my core workflow

Unfortunately in larger codebases or dynamic languages these tools are just not good enough today. At least not those I and my employers have tried.

They're either incomplete (you don't get ALL references or you get false references) or way too slow (>10 seconds when rg takes 1-2).

Recommendations are most welcome.

Only thing I can recommend is using C# (obviously not always possible). Never had an issue with these functions in Visual Studio proper no matter how big the project.

Even with IDEs, I find that I grep through source trees fairly often.

Sometimes it's because I don't completely trust the IDE to find everything I'm interested in (justifiably; sometimes it doesn't). Sometimes it's because I'm not looking to dive into the code and do serious work on it; I'm just doing a quick drive-by check/lookup for something. Sometimes it's because I'm ssh'd into another machine and I don't have the ability to easily open the sources in an IDE.

with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

with a legacy codebase, or a fork of a dependency that had to be patched which uses an incompatible buildsystem, or any C/C++/obj-c/etc that heavily uses the preprocessor or nonstandard build practices, or codebases that mix lots of different languages over awkward FFI boundaries and so on and so forth -- there are so many situations where sometimes an IDE just can't get you 100% of the way there and you have to revert to grepping to do any real work

that being said, I don't fully support the idea of handcuffing your code in the name of greppability, but I think dismissing it as a metric under the premise that IDEs make grepping "obsolete" is a little bit hasty

> with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

I wish, but no. I've found people will make a mess of everything. Which is why I don't trust solutions that rely on humans having more discipline, like what this article advocates.

In any situation where grep is your last saviour, you cannot rely on the greppability of the code. You'll have to check and double check everything, and still accept the risk of errors.

I've come to really like language servers for big personal and work projects where I already have my tools configured and tuned for efficiently working with it.

But being able to grep is really nice when trying to figure out something out about a source tree that I don't yet have set up to compile, nor am I a developer of. I.e., I've downloaded the source for a tool I've been using pre-built binaries of and am now trying to trace why I might be getting a particular error.

Sure, if you have the luxury of having a functional IDE for all of your code.

You can't imagine how much faster I was than everybody else at answering questions about a large codebase just because I knew how to use ripgrep (on Windows). "Knowing how to grep" is a superpower.

A bit on the other side of the argument, I use grep plus find plus some shell work to do source code analysis for security reviews. grep doesn't really understand the syntax of languages, and that is mostly OK.

I've used this technique on auditing many code bases including the C family, perl, Visual Basic, C# and SQL.

With this sort of tool, I don't need to look for language-particular parsers--so long as the source is in a text file, this works well.

posts like this sound like the author routinely solves harder problems than you are, because the solutions you suggest don't work in the cases the post is about. we've had 'go to definition' since 01978 and 'find usages' since 01980, and you should definitely use them for the cases where they work

From the article,

- dynamically built identifiers is 100% correct, never do this. Breaks both text search and symbol search, results in complete garbage code. I had to deal with bugs in early versions of docker-compose because of this.

- same name for things across the stack? Shouldn't matter, just use find usages on `getAddressById`. Also easy way to bait yourself because database fields aren't 1:1 with front-end fields in anything but the simplest of CRUD webshit.

- translation example: the fundamental problem is using strings as keys when they should be symbols. Flat vs nested is irrelevant here because you should be using neither.

- react component example: As I mentioned in another comment, trivially managed with Find Usages.

Nothing in here strikes me as "routinely solves harder problems," it's just standard web dev.

yes, i agree that standard web dev is full of these problems, which can't be solved with go-to-definition and find-usages. it's a mess. i wasn't claiming that these messy, hard problems where grep is more helpful than etags are exotic; they are in fact very common. they are harder than the problems lucumo is evidently accustomed to dealing with because they don't have correct, complete solutions, so we have to make do with heuristics

advice to the effect of 'you should not make a mess' is obviously correct but also, in many situations, unhelpful. sometimes i'm not smart enough to figure out how to solve a problem without making a mess, and sometimes i inherit other people's messes. in those situations that advice decays into 'you should not try to solve hard problems'

> they are harder than the problems lucumo is evidently accustomed to dealing with because they don't have correct, complete solutions, so we have to make do with heuristics

Funny.

But since you asked. The hardest problems I've solved haven't been technical problems for years. Not that I stopped solving technical problems, or that I started solving only the easier problems. I just learned to solve people problems more.

People problems are much harder than technical problems.

The author showed a simple people problem: someone who needs to know about better tooling. If we were working together, showing them some tricks wouldn't take much time and would improve their productivity.

An example of a harder problem is when someone tries to play aggressive little word games with you. For example, trying to put you down by loudly making assumptions about your career and skills. One way to deal with that is to just laugh it off. Maybe even make a self-deprecating joke. And then continuing as if nothing happened.

But that assumes you want or have to continue working productively with them. If you don't, it can be quite enjoyable to just laugh in their face. After all, it's never the sharpest tool in the shed, or the brightest light that does that. In fact, it's usually the least useful person around, who is just trying to hide that fact. Of course, once you realize that, it becomes hard to laugh, because it's no longer funny. Just sad and pitiful.

> look! i already told you! i deal with the god damned customers so the engineers don't have to! i have people skills! i am good at dealing with people! can't you understand that? what the hell is wrong with you people?

(office space, https://www.youtube.com/watch?v=hNuu9CpdjIo)

look, lucumo, i'm sure you have excellent people skills. which is why you're writing five-paragraph power-trip-fantasy comments on hn about laughing in people's faces as you demonstrate your undeniable dominance over them, then take pity on them. but i'm not sure those comments really represent a contribution to the conversation about code greppability; they're just ego defense. you probably should not have posted them

(edited to remove things that could be interpreted as a personal attack)

an obvious thing about both people problems and technical problems is that they both cover a full spectrum of difficulty from trivial to impossible. a trivial people problem is buying a soft drink at a convenience store, if you have the money and speak the language. a trivial technical problem is tying your shoes. an impossible people problem is ending poverty. an impossible technical problem might be finding a polynomial-time decision procedure for an np-complete problem, or perhaps building a perpetual-motion machine, or a black-hole generator. both kinds of problems have every degree of difficulty in between, too. stable blue leds seemed like an impossible technical problem until shuji nakamura figured out how to make them. conquering asia seemed like an impossible people problem until genghis khan did it

even within the ambit of modifying a software system, figuring out what parts of the code are affected by a possible change, there are trivial technical problems and problems that far exceed current human capacities. nobody knows how to write a bug-free web browser or how to maintain the linux kernel without introducing new bugs

given this obvious fact, what are we to make of someone saying, 'people problems are much harder than technical problems'? obviously it isn't the case that all people problems are much harder than all technical problems, given that some people problems are easy, and some technical problems are impossible. and if we interpret it as meaning that some people problems are much harder than some technical problems, it's a trivial tautology which would be just as true if we reversed the terms to say '[some] technical problems are much harder than [some] people problems'

the most plausible interpretation is that it means that the people problems the speaker is most familiar with are much harder than the technical problems the speaker is most familiar with, and they are carelessly extrapolating from their limited experience of those problems to the entire class. it's not a statement about the world; it's a statement about the author and the environment they're familiar with

we can immediately deduce from this that you are not andrew wiles, who spent six years working alone on a technical problem which had eluded the world's leading mathematicians for some 350 years, for the solution of which he was appointed a knight commander of the order of the british empire and awarded the abel prize, along with a long list of other prizes. you give the appearance of being so unfamiliar with such difficult technical problems that you cannot imagine that they even exist, though surely with a little thought you can see that they do. in any case, for a very long time, you have not been working on any technical problems that seem impossible to you. i believe you that it's not that you started solving only the easier problems; all the problems you ever solved were the easier problems

which is to say, you aren't accustomed to dealing with difficult technical problems

perhaps we can also infer that you have been faced with many very difficult people problems—perhaps you are a politician or a clinical psychologist in a mental institution, or you have family members with serious mental illness

the reason i think this is worth discussing in some depth is that you aren't the first person i've seen expressing the transparently nonsensical sentiment that 'people problems are much harder than technical problems'. i've seen it over and over again for decades, but i've never seen a clear and convincing explanation of why it's nonsense

Interface-heavy languages break IDEs. In .NET at least, "go to definition" jumps you to the interface definition which you probably aren't interested in (vs. the specific implementation you are trying to dig into). Also with .NET specifically XAML breaks IDE traceability as well.

It seems like the law of diminishing returns; while I'm sure in a few cases this characteristic of a code writing style is extremely useful, it cuts into other things such as readability and conciseness. Fewer lines can mean fewer bugs, within reason, if you aren't in lisp and are using more than 3 parentheses, you might want to split it up because the compiler/JIT/interpreter is going to anyway.

IDEs are cool and all, but there is no way I'm gonna let VSCode index my 80GB yocto tmp directory. Ctags can crunch the whole thing in a few minutes, and so can grep.

Plus there are cases where grep is really what you need, for example after updating a particular command line tool whose output changed, I was able to find all scripts which grepped the output of the tool in a way that was broken.

Honestly, in my 18 years of software development, I haven't "greped" code once.

I only use grep to filter the output of CLI tools.

For code, I use my IDE or repository features.

On the flipside, IDE's can turn you into lazy, inefficient programmers by doing all the hand-holding for you.

If your feelings are anemic when tasked with doing a grep, its because you have lost a very valuable skill by delegating it to a computer. There are some things the IDE is never going to be able to find - lest it becomes the development environment - so keeping your grep fu sharpened is wise beyond the decades.

(Disclaimer: 40 years of software development, and vim+cscope+grep/silversearcher are all I really need, next to my compiler..)

    > lazy... programmers

Since when was that a bad thing? Since time immemorial, it has been hailed as a universal good for programmers to be lazy. I'm pretty sure Larry Wall has lots of jokes about this on Usenet.

Also, I can clearly remember switching from vim/emacs to Microsoft Visual Studio (please, don't throw your tomatoes just yet!). I was blown away by IntelliSense. Suddenly, I was focusing more on writing business logic, and less time searching for APIs.

This is the wrong type of lazy.

Command line tools like grep are force multipliers for programmers. GUI's come with the risk of not being able to learn how to leverage this power. In the end, that often leads to more manual work.

And today, bash is a lingua franca that you can bring with you almost everywhere. Even Windows "speaks" bash these days, with WSL.

In itself, there's nothing wrong with using the built-in features of a GUI. Right-clicking a method (or using a keyboard shortcut) to find the definition in a given code base IS nice for that particular operation.

But by knowing grep/awk/find/git command line and so on, combined with bash scripting and advanced regular expressions, you open up a new world of possibilities.

All those things CAN be done using Python/C#/Java or whatever your language is. But a 1-liner in bash can be 10-100 lines of C#.

Where does this stupid notion come from that using powerful tools means you can't handle the less powerful ones anymore? Did your skills with a hand screwdriver atrophy when you learned how to use a powered screwdriver? Come on.

I use grep multiple times a day. I write bash scripts quite often. I'm not speaking from a position of ignorance of these tools. They have their place as a lowest common denominator of programming tools. But settling for the lowest common denominator is not a path to productivity.

Doesn't mean you should forget your skills, but it does mean you should investigate better tools. And leverage them. A lot.

> But a 1-liner in bash can be 10-100 lines of C#.

Yes. And the reverse is also true. bash is fast and easy if there's an existing tool you can leverage, and slow and hard when there's not.

I count the IDE and stuff like LSP as natural extensions of the compiler. For sure I grep (or equivalent) for stuff, but I highly prefer statically typed languages/ecosystems.

At the end of the day, I'm here to solve problems, and there's no end to them -- might as well get a head start.

> If your feelings are anemic

I'm not feeling anemic. The tool is anemic, as in, underpowered. It returns crap you don't want, and doesn't return stuff you do want.

My grep-fu is fine. It's a perfectly good tool if you have nothing better. But usually you do have something better.

Using the wrong tool to make yourself feel cool is stupid. Using the wrong tool because a good tool could make you lazy shows a lack of respect for the end result.

Huh? I have an old hand-powered drill from my Grandpa in my workshop. I used it once for fun. For all other tasks I use a powered drill. Same for IDEs. They help your refactor and reason about code - both properties I value. Sure, I could print it and use a textmarker, but I'm not Grandpa

Knowing the bash ecosystem translates better to how you use the knife in the kitchen.

Sure you can replace most uses of a knife with power tools, but there is a reason why most top chefs still rely on that knife for most of those tasks.

A hand powered drill is more like a hand powered meatgrinder. It has the same limitation as the powered versions, and is simply a more primitive version.

Definitely true when you can use static typing.

Unfortunately sometimes you can't, and sometimes you can but people can't be arsed, so this is still a consideration.

I tried a good IDE recently: Jetbrains IntelliJ and Webstorm. Considered the topdog of IDEs. Was working on a typescript project which uses npm link to symlink another local project into the node_modules of current project.

The great IDEs IntelliJ and Webstorm stopped autosuggesting completions from the symlinked project.

Open up Sublime Text again. Worked perfectly. That is why Jetbrains and their behemoth IDEs are utter shite.

Write your code to have symmetry and make it easy to grep.

>I tried a good IDE recently: Jetbrains IntelliJ

Having dealt with IntelliJ for 3 years due to education stuff - I laughed out here. Even VS is better than ideaj.

By not using literals everywhere. All literals are defined somewhere (start of function, class etc) as enums or vars and used.

Just because I have 20 usage of 'shipping_address' doesn't mean I'll have this string 20 times in different places.

Grep has its place and I often need to grep code base which have been written without much thoughts towards DX. But writing it nicely allows LSP to take over.

This is what the article starts with: "Even in projects exclusively written by myself, I have to search a lot: function names, error messages, class names, that kind of thing."

All of that is trivial to search for with a tool that understands the language.

> All of that is trivial to search for with a tool that understands the language.

Isn't string search, or grepping for patterns, even more trivial? So what is your argument? You found an alternative method, good, but how is it any better?

In my own case, I wrote a library that we used in many projects, and I often wanted to know where and how functions from my lib were used in those projects. For example, to be able to tell how much of an effort it would be for the users to refactor when I changed something. However, your method of choice at least with my IDE (Webstorm) only worked locally within the project. Only string search would let me reliably and easily search all projects.

I actually experimented creating a "meta" project of all projects, but while it worked that lead to too many problems, and the main method to find anything still was string search (CTRL-SHIFT-F Find dialog in IDEA IDEs is string search and it's a wonderful dialog in that IDE family). I also had to open that meta project. Instead, I created a gitignored folder with symlinks to the sources of all the other projects and created a search scope for that folder, in which the search dialog let me string-search all projects' sources at once right from within the library project and still being able to use the excellent Find dialog.

In addition, I found that sometimes the IDE would not find a usage even within the project. I only noticed because I used both methods, and string search showed me one or two places more than the method that relied on the underlying code-parsing. Unfortunately IDEs have bugs, and the method you suggests relies on much more work of the IDE in parsing and indexing compared to the much more mundane string or string pattern search.

> Isn't string search, or grepping for patterns, even more trivial?

It's not trivial when you looking for symbols in context.

> the method you suggests relies on much more work of the IDE in parsing and indexing compared to

...compared to parsing and indexing you have to do manually because a full-text search (especially in a large codebase) will return a lot of irrelevant info?

Funnily enough I also have a personal anecdote. We had a huge PHP code base based on Symfony. We were in the middle of a huge refactoring spree. I saw my colleagues switch from vim/emacs to Idea/WebStorm looking at how I easily found symbols in the code base, found their usages, refactored them etc. compared to the full-text search they were always stuck with.

This was 5-6 years ago, before LSP became ubiquitous.

> It's not trivial

Did you miss the comparison? The "more trivial"? The context of my response? Please read the parent comment I responded to, treating my comment as standalone and adding some new meaning makers no sense.

String search is more trivial than a search that involves an interpretation of the code structure and meaning. I have no idea why you wish to start a discussion about such trivial statement.

> * because a full-text search (especially in a large codebase) will return a lot of irrelevant info?*

It doesn't do that for me but instead works very well. I don't know what you do with your symbol names, but I have barely any generic function names, the vast majority of them are pretty unique.

No idea how you use search, but I'm never looking for "doSomething(", it's always "doSomethingVerySpecific()", or some equally specific string constant.

I don't have the problems you tell me I should have, and my use case was the subject of my comment, as should be clear, as well as my comment being a response to a specific point made by the parent comment.

> All of that is trivial to search for with a tool that understands the language.

Some literal in a log message may come from the code or it might be remapped in some config file outside the language the LSP is looking at, or an environment variable etc.. I just go back and forth with grep and IDE tools, both have different tradeoffs.

The thing is, so many people are weirdly obsessed with never using any other tools besides full-text search. As if using useful tools somehow makes them a lesser programmer or something :)

I actually don't think there's a tool that handles usages when using PHP varvars or when using example number one there which is parametrically choosing a table name.

When you string interpolate to build the name you lose searchability.

Yes, full-text search is a great fallback when everything else fails. But in the use cases listed at the beginning of the article it's usually not needed if you have proper tools

> Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

Completely agreed. The React component example in the article is trivial solvable with any modern IDE; right click on class name, "Find Usages" (or use the appropriate hotkey, of course). Trying to grep for a class name when you could just do that is insane.

I mainly see this from juniors who don't know any better, but as seen in this thread and the article, there are also experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.

That's a problem of code organisation though. Large codebases should be split into multiple repos. At the end of the day code structure is not something to be decided only by compilation strategy, but by developer ergonomics as well. A massive repo is a massive burden on productivity.

The second point here made me realize that it'd be super useful for a grep tool to have a "super case insensitive" mode which expands a search for, say, "FooBar|first_name" to something like /foo[-_]?bar|first[-_]?name/i, so that any camel/snake/pascal/kebab/etc case will match. In fact, I struggle to come up with situations where that wouldn't be a great default.

Hey, I just created a new tool called Super Grep that does exactly what you described.

I implemented a format-agnostic search that can match patterns across various naming conventions like camelCase, snake_case, PascalCase, kebab-case. If needed, I'll integrate in space-separated words.

I've just published the tool to PyPI, so you can easily install it using pip (`pip install super-grep`), and then you just run it from the command line with `super-grep`. You can let me know if you think there's a smarter name for it.

Source: https://www.github.com/msmolkin/super-grep

Wow, thanks so much for the encouragement and advice, dang! I'm honored to receive a personal response from you and so soon after posting. I really appreciate the suggestion to post this as a Show HN. If I end up doing it, I'll definitely wait a bit–thanks for that suggestion, as I would have thought to do the opposite otherwise. Nice of you to offer to put it in the second-chance pool as well.

pretty cool and to me a better approach than the prescriptive advice from the OP. to me the crux of the argument is to make the code more readable from a popular tool. but if this can be well-integrated into common ide (or even grep perhaps), it would take away most of the argument down to personal preference.

Adding to that, I'm often bitten trying to search for user strings because they're split across lines to adhere to 80 characters.

So if I'm trying to locate the error message "because the disk is full" but it's in the code as:

  ... + " because the " + 
    "disk is full")

then it will fail.

So really, combining both our use cases, what would be great is to simply search for a given case-insensitive alphanumeric string in files that skips all non-alphanumeric characters.

So if I search for:

  Foobar2

it would match all of:

  FooBar2
  foo_bar[2]
  "Foo " + \
    ("bar 2")
  foo.bar.2

And then in the search results, even if you get some accidental hits, you can be happy knowing that you didn't miss anything.

These are both of the problems I regularly have. The first one I immediately saw when reading the title of this submissionw as the "super case insensitive" that I often see when working on Go Codebases particularly when using a combination of Go Classes and YAML or JSON. Also happens with command line arguments being converted to variables.

But the string split thing you mentioned happens a lot when searching for OpenStack error messages in Python that is often split across lines like you showed. My current solution is to randomly shift what I'm searching for, or try pick the most unique line.

fwiw I pretty frequently use `first.?name` - the odds of it matching something like "FirstSname" are low enough that it's not an issue, and it finds all cases and all common separators in one shot.

(`first\S?name` is usually better, by ignoring whitespace -> better ignores comments describing a thing, but `.` is easier to remember and type so I usually just do that)

> "super case insensitive"

lets say someone would make a plugin for their favorite IDE for this kind of search. How would the details look like?

To keep it simple, lets assume we just do the super-case-insensitivity, without the other regex condition. Lets say the user searches for "first_name" and wants to find "FirstName".

one simple solution would be to have a convention where a word starts or ends, e.g. with " ". So the user would enter "first name" into the plugin's search field. The plugin turns it into "/first[-_]?name/i" and gives this regexp to the normal search of the IDE.

another simple solution would be to ignore all word boundaries. So when the user enters "first name", the regexp would become "/f[-_]?i[-_]?r[-_]?s[-_]?t[-_]?n[-_]?a[-_]?m[-_]?e[-_]?/i". Then the search would not only be super-case-insensitive, but super-duper-case-insensitive. I guess the biggest downside would be, that this could get very slow.

I think implementing a plugin like this would be trivial for most IDEs, that support plugins.

Am I missing something?

Hm I'd go even simpler than that. Notably, I'd not do this:

> So the user would enter "first name" into the plugin's search field.

Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.

Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)

Shame on me for jumping past the simple solutions, but...

If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.

Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.

IIUC, you're not missing anything though your interpretation is off from mine*. He wasn't saying it'd be hard, he was saying it should be done.

* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.

That already exists as "?" and was used in their example:

  /first[-_]?name/i

Or to use your example, just checking for underscores and not also dashes:

  /first_?name/i

Backslash is already used to change special characters like "?" from these meanings into just "use this character without interpreting it" (or the reverse, in some dialects).

This reminds me of the substitution mode of Tim Pope's amazing vim plugin [abolish](https://github.com/tpope/vim-abolish?tab=readme-ov-file#subs...)

Basically in vim to substitute text you'd usually do something with :substitute (or :s), like:

:%s/textToSubstitute/replacementText/g

...and have to add a pattern for each differently-cased version of the text.

With the :Subvert command (or :S) you can do all three at once, while maintaining the casing for each replacement. So this:

textToSubstitute

TextToSubstitute

texttosubstitute

:%S/textToSubstitute/replacementText/g

...results in:

replacementText

ReplacementText

replacementtext

Also just realised while looking at the docs it works for search as well as replacement, with:

:S/textToFind

matching all of textToFind TextToFind texttofind TEXTTOFIND

But not TeXttOfFiND.

Golly!

In vim, I believe there's a setting that you can flip to make search case sensitive.

In my setup, `/foo` will match `FoO` and so on, but `/Foo` will only match `Foo`

Nim comes bundled with a `nimgrep` tool [0], that is essentially grep on steroids. It has `-y` flag for style insensitive matching, so "fooBar", "foo_bar" and even "Foo__Ba_R" can be matched with a simple "foobar" pattern.

The other killer feature of nimgrep is that instead of regex, you can use PEG grammar [1]

  [0] - https://nim-lang.github.io/Nim/nimgrep.html
  [1] - https://nim-lang.org/docs/pegs.html

Let's say you have a FilterModal component and you're using it like this: x-filter-modal

Improving the IDE to find one or the other by searching for one or the other is missing the point or the article, that consistency is important.

I'd rather have a simple IDE and a good codebase than the opposite. In the example that I gave the worst thing is that it's the framework which forces you do use these two names for the same thing.

My point is that if grep tools were more powerful we wouldn't need this very particular kind of consistency, which gives us the very big benefit of being allowed to keep every part of the codebase in its idiomatic naming convention.

I didn't miss the point, I disagreed with the point because I think it's a tool problem, not a code problem. I agree with most other points in the article.

Fuzzy search is not the same. For instance, it might by default match not only “FooBar” and “foo_bar” but also e.g. “FooQux(BarQuux)”, which in a large code base might mean hundreds of false positives.

Ideally there'd be some sort of ranking or scoring that would happen to sort by. FooQux(BarQuux) would seemingly rank much lower then FooBar when searching for FooBar or "Foo Bar" but might still be useful in results if ranked and displayed lower.

Indeed, that's a good solution – and I believe e.g. fzf does some sort of ranking by default. The devil is however in the details:

One minor inconvenience is that the scoring should ideally be different per filetype. For instance, Python would count "foo-bar" as two symbols ("foo minus bar") whereas Lisp would count it was one symbol, and that should ideally result in different scores when searching for "foobar" in both. Similarly, foo(bar) should ideally have a lower different score than "foo_bar" for symbol search even though the keywords are separated by the same number of characters.

I think this can be accomodated by keeping a per-language list of symbols and associated "penalties", which can be used to calculate "how far" keywords are from each other in the search results weighted by language semantics :)

I advocate for greppability as well – and in Swedish it becomes extra fun – as the equivalent phrase in Swedish becomes "grep-bar" or "grep-barhet" and those are actual words in Swedish – "greppbar" roughly means "understandable", "greppbarhet" roughly means "the possibility to understand"

We do tar, for xfz I think you have to look to the Slavic languages :)

Anyway, to answer your question:

  $ grep -Fxf <(ls -1 /bin) /usr/share/dict/swedish 
  ack
  ar
  as
  black
  dialog
  dig
  du
  ebb
  ed
  editor
  finger
  flock
  gem
  glade
  grep
  id
  import
  last
  less
  make
  man
  montage
  pager
  pass
  pc
  plog
  red
  reset
  rev
  sed
  sort
  sorter
  split
  stat
  tar
  test
  transform
  vi

:)

[edit]: Ironically, grep in that list is not the same word as the one OP is talking about. That one is actually based on grepp, with the double p. grep means pitchfork.

Yeah, that’s one type.

Another is for turning soil at a small scale by hand (also called a cultivator, I think).

But they all have somewhat long prongs.

Party pooper checking in: easier to remember is that v is the verbose option in most tools, x and f you already know, z is auto-detected for as long as I remember so you don't need to pass that. Add c for creating an archive and, congratulations, you can now do 90% of the tasks you'll ever want to do with tar, especially defusing xkcd bombs!

(To go for 99%, add t for testing an archive to your repertoire. This is all I ever use; anything else I do with the relevant tools that I already know, like compression settings `tar c . | zstd -19 > my.tar.zstd` or extracting to a folder `cd /to/here && tar x ~/Downloads/ar.tar`. I'm sure tar has options for all this but that's not the one thing it should do and do well.)

I hadn't heard of the German option but I love it, shame really that z is obsolete :(

I mean, you're not wrong. Learning what stuff means is good :) But there's also the part where making up a ridiculous story, pun or such enables it being a very strong mnemonic.

I know v is just the verbose option, though I didn't know z was autodetected.

Way back (~15y or so?) I was reading bash.org just for the jokes cause I was on IRC, I knew what a tar/tar.gz file is, but I had never needed to extract one from the command line (might've been on Windows back then). However, because I remembered the funny joke, the first time I was on a Linux system confronted with a tgz, I knew exactly what to type :)

Honestly to this day, I've never needed to create a tar archive, only to unpack them (when I need to archive+compress files it's usually to send to other people, and I pick zip cause everyone can deal with it). But `tar --help` and `man tar` are there in case I ever might.

As far as I understood, it was part of the language before.

The german equivalent of the word would be probably "greifbar". Being able to hold something, usually used metaphorically.

"zu greifen" may best translate to "to grip", but "grip" has different mental connotations in English (it refers to mental stability, not intellectual insight).

The best dual purpose translation of "zu greifen"/"gripe" (German/Scandinavian) meaning "zu begreifen"/"begripe"/"understand" would be "to grasp", which covers both physically grabbing into something and also to understand it intellectually.

All these words stem back to the Proto-Indo-European gʰrebʰ, which more or less completes the circle back to "grep".

Could I suggest that greppbarhet is more precisely translated as “the ability of being understood”?

(Norwegian here. Our languages are similar, but we miss this one.)

Norwegian still translates grep as "grip"/"grab". I always thought of grepping as reaching in with a hand into the text and grabbing lines. That association is close at hand (insert lame chuckle) for German and English speakers too.

In English that association is going to depend a lot on one's accent; until now I've never associated grep-ing with anything other than using grep! (But, equally, that might just be a me thing.)

It doesn’t sound anything like grip in my accent but for some reason the association has always been there for me. Grabbing or ripping parts from the file.

So Dutch/German make "begreif" a verb, for Swedish it is just a noun (that means "concept").

But "begrijpelijk" has a clone: "begriplig". An adverb based on a verb in a foreign dictionary. There is no verb that goes "begreppa", it's just "greppa".

Nah, you've got it backwards. The article isn't about dodging understanding - it's about making it way easier to spot patterns in your code. And that's exactly how you start to really get what's going on under the hood. Better searching = faster learning. It's like having a good map when you're exploring a new city

The article advocates making code harder to understand for the sake of better search. It's like forcing a city to conform to a nice, clean, readable map: it'll make exploring easier for you, at the cost of making the city stop working.

I've seen some pretty wild conditional string interpolation where there were like 3-4 separate phrases that each had a number of different options, something akin to `${a ? 'You' : 'we'} {b ? 'did' : 'will do' } {c ? 'thing' : 'things' }`.

When I was first onboarding to this project, I was tasked with updating a component and simply tried to find three of the words I saw in the UI, and this was before we implemented a straightforward path-based routing system. It took me far too long just to find what I was going to be working on, and that's the day I distinctly remember learning this lesson. I was pretty junior, but I'd later return to this code and threw it all away for a number of easily greppable strings.

Tangential: I love it when UIs say "1 object" and "2 objects". Shows attention to detail.

As opposed to "1 objects" or "1 object(s)". A UI filled with "(s)", ughh

I like the more robotic "Objects: 1" or "Objects: 2", since it avoids the pluralization problems entirely (e.g., in French 0 is singular, but in English it's plural; some words have special when pluralized, such as child -> children or attorney general -> attorneys general). And related to this article, it's more greppable/awkable, e.g. `awk /^Objects:/ && $2 > 10`.

Fun fact - I had to localize this kind of logic to my language (Polish). I realized quickly it's fucked up.

This is roughly the logic:

    function strFromNumOfObjects(n) {
      if (n === 1) {
          return "obiekt";
      }
      let last_digit = (n%10);
      let penultimate_digit = Math.trunc((n%100)/10);
      if ((penultimate_digit == 0 || penultimate_digit >= 2) && last_digit > 1 && last_digit <= 4) {
          return "obiekty";
      }
      return "obiektów";
    }

Basically pluralizing words in Polish is a fizz-buzz problem :) In other Slavic languages it should be similar BTW

This is the reason many coding styles and tools (including the Linux kernel coding style and the default Rust style as implemented in rustfmt) do not break string constants across lines even if they're longer than the desired line length: you might see the string in the program's output, and want to search for the same string in the code to find where it gets shown.

My team drives me bonkers with this. They hear the general principle "really long lines of code are bad", but extrapolate it to "no characters shall pass the soft gutter no matter what".

Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines. Now when you see that list of structs, you wonder "why is this one different?" and you have to read carefully to determine, nope, it just contained one longer string. Or god forbid the reformat all the structs to match, turning a 1-page file into 3 pages, and making it so you have to read and understand each element of each struct just to see what's going on.

If I could have written the rule of thumb, I would have said "No logic or control shall happen after the end of the gutter." But if there's a paragraph-long string on one line- who cares?? We all have a single keystroke that can toggle soft-wrap, and the odds that you're going to need to know anything about that string other than "it's a long string" are virtually nil.

Sorry. I got triggered. :-)

Yep this triggers the fuck out of me too. It drives me absolutely insane when I'm taking the time and effort to write good test cases that use inline per test data that I've taken the time to format so it's nice and readable for the next person, then the next person comes along, spends 30 seconds writing some 2 line rubbish to hit a code coverage metric, then spends another 60 seconds adding a linter rule that blows all the test data out to 400 lines of unreadable dogshit that uses only the left 15% of screen real estate.

I routinely spot 3-line prints with the string on its own line in our code. Even for cases where the string + print don't even reach the 80 character "limit"

This is why autoformatters that frob with line endings are just terrible and fundamentally broken.

I'm fairly firmly in the "wrap at 80" camp by the way; but sometimes a tad longer just makes sense. Or shorter for that matter: forced removal of line breaks is just as bad.

80 feels really impractically narrow. A project I work on uses 110 because it's approximately the widest you can comfortably compare two revisions on the same monitor, or was for some person at some time, and I can live with it, but any less would just feel so cramped. A few indentation levels deep and I'd be writing newspaper columns.

80 lines wide is the width we had back in the late 90s. Displays are nothing like that anymore. I managed to talk my team into setting the linter to something more reasonable, but individuals still feel like they're being virtuous if they stick to 80 and reformat any line they touch that goes over. It's just dogma!

I have been places where we allow long strings, but other things aren’t allowed and generally 80 to 100 char limits otherwise. I like 100 for c++/java and 80 for C. If it gets much longer than that (not being strings) then it’s time for a rethink in most cases, grouping/scoping symbols are getting too deep. I’m sure other languages may or may not have that as a reasonable argument. It is just a rule of thumb though.

This is world autoformatters have wrought. The central dogma of the autoformatter is that "formatting" is based on dumb syntactic rules with no inflow of imprecise human judgements.

Most autoformatters do not reformat string constants as GP has said, and even if they did, this is something that can be much more accurately and correctly specified with an AF than with a human.

Autoformatting collectively saves probably close to millions of work hours per year in our industry, and that’s at the current adoption. Do you think it’s productive to manually space things out, clean up missing trailing commas and what not? Machines do it better.

> Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines.

Autoformatters absolutely do this. They do not understand considerations like symmetry.

I am doubtful as to the costs of "somewhere in the codebase there is a missing trailing comma".

The wins of autoformatting are 1) never having to have a dispute over formatting or have formatting depend on who last touched code, 2) never manually formatting code or depending on someone's editor configuration, 3) having CI verify formatting, and 4) not having someone (intentionally or unintentionally) make unrelated formatting changes in a commit.

Also, autoformatters can be remarkably good. For instance, rustfmt will do things like:

    x.func(Some(MyStruct {
        field: big + long + expr,
        field2,
    }));

rather than mindlessly introducing three levels of indentation.

If I recall, rustfmt had a bug where long string literals (say, over 120 chars or so — or maybe if it was that the string was long enough to extend beyond the gutter when properly indented?) would prevent formatting of the entire file they were in. Has this been fixed?

Not the whole file, but sufficiently long un-line-breakable code in a complex statement can cause rustfmt to give up on trying to format that statement. That's a known issue that needs fixing.

Rust and Javascript and Lisp all get extra points because they put a keyword in front of every function definition. Searching for “fn doTheThing” or “defun do-the-thing” ensures that you find the actual definition. Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in. Some C coding conventions have you split the definition into two lines, first the return type on a line followed by a second line that starts with the function name. It looks ugly, but at least you can search for “^doTheThing” to find just the definition(s).

Golang has a similar property as a side-effect of the following design decision.

  ... the language has been designed to be easy to analyze and can be parsed without a symbol table

Taken from https://go.dev/doc/faq

The "top-level declarations" in source files are exactly: package, import, const, var, type, func. Nothing else. If you're searching for a function, it's always going to start with "func", even if it's an anonymous function. Searching for methods implemented by a struct similarly only needs one to know the "func" keyword and the name of the struct.

Coming from a background of mostly Clojure, Common Lisp, and TypeScript, the "greppability" of Go code is by far the best I have seen.

Of course, in any language, Go included, it's always better to rely on static analysis tools (like the IDE or LSP server) to find references, definitions, etc. But when searching code of some open source library, I always resort to ripgrep rather than setting up a development environment, unless I found something that I want to patch (which in case I set up the devlopment environment and rely on LSP instead of grep to discover definitions and references).

I'm not so sure about greppability in the context of Go. At least at Google (where Go originates, and whose style guide presumably has strong influence on other organizations' use of the language), we discourage "stuttering":

> A piece of Go source code should avoid unnecessary repetition. One common source of this is repetitive names, which often include unnecessary words or repeat their context or type. Code itself can also be unnecessarily repetitive if the same or a similar code segment appears multiple times in close proximity.

https://google.github.io/styleguide/go/decisions#repetitive-...

This is the style rule that motivates the sibling comment about method names being split between method and receiver, for what it's worth.

I don't think this use case has received much attention internally, since it's fairly rare at Google to use grep directly to navigate code. As you suggest, it's much more common to either use your IDE with LSP integration, or Code Search (which you can get a sense of via Chromium's public repository, e.g. https://source.chromium.org/search?q=v8&sq=&ss=chromium%2Fch...).

The thing about stuttering is that the first part of the name is fixed anyway, MOST of the time.

If you want to search for `url.Parse`, you can find most of the usages just by searching for `url.Parse`, because the package will generally be imported as `url` (and you won’t import Parse into your namespace).

It’s not as good as find references via LSP but it is like 99% accurate and works with just grep.

That works somewhat for finding uses of a native Go module, although I've seen lots of cases where the module name is autogenerated and so you need to import with an alias (protobufs, I'm looking at you). It also doesn't work quite so well in reverse -- you need to find files with `package url;`, then find instances of '\bfunc Parse\('.

Lastly, one of the bigger issues is (as aforementioned sibling commenter mentioned) the application of this principle to methods. This is especially bad with method names for established interfaces, like Write or String. Even with a LSP server, you then need to trace up the call stack to figure out what concrete types the function is being called with, then look at the definitions of those types. I can't imagine wanting to do that with only (rip)grep at my disposal.

Single letter variables in Golang are to be used in small, local contexts. Akin to the throwaway i var in for loops. You only grep the struct methods, the same way no one greps 'this' or 'self'.

The code bases you've been reading, and even some of the native libraries, don't do it properly. Probably due to legacy reasons that wouldn't pass readability approvals nowadays.

> The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.

The convention, not just in Go, is that the smaller the scope, the smaller the variable reference.

So, sure, you're going to see single-letter variables in short functions, inside short block scopes, etc, but that is true of almost any language.

I haven't seen single-letter variables in Go that are in a scope that isn't short.

Of course, this could just mean that I haven't seen enough of other peoples Go source.

Zipf's law, right - these rules are a formalization of our brain's functionality with language.

Of course, with enough code, someone does everything.

I like using l for logger and db for database client/pool/handle even if there's a wider scope. And if the bulk of a file is interacting with a single client I might call that c.

The way I have seen this is that single letter variables are mostly used when declaration and (all) usages are very close together.

If I see a loop with i or k, v then I can be fairly confident that those are an Index or a Key Value pair. Also I probably don't need to grep them since everything interacting with these variables is probably already on my screen.

Everything that has a wider scope or which would be unclear with a single letter is named with a more descriptive name.

Of course this is highly dependent on the people you work with, but this is the way it works on projects I have worked on.

Golang gets zero points from me because function receivers are declared between func and the name of the function. God ai hate this design choice and boy am I glad I can use golsp.

this thread is about using `grep` to find things, and this subthread is specifically about how the `func` keyword in golang makes it easy to distinguish the definition of a function from its uses, so yes, because `grep 'func lart('` will not find definitions of `lart` as a method. you might end up with something like `grep 'func .*) *lart('` which is both imprecise and enough noise that you will not want to type it; you'll have to can it in a script, with the associated losses of flexibility and transparency

that's going to find all the functions that take an argument named lart or of a lart type too, but it also sounds like a thing i really want to try

Also, anything that contains "func" and "lart" as a substring, e.g. foobar(function), blart(baz).

It's not far off from my manually-constructed patterns when I want to make sure I find a function definition (and am willing to tolerate some false positives), but I personally prefer fine-grained control over when it's in use.

oh, i mean that instead of

    grep -P '^func [^)]+\) methodName\('

you could say

    grep 'func [^)]*) methodName('

which is a bit less typing

however, i have to admit that i sort of ensnared myself in my own noose here by being too clever! i forgot that grep's regexp dialect only supports + if you \ it, and it took me six tries to figure out why i wasn't getting any grep hits. there's a lot to be said for the predictability and consistency of pcre!

Can’t say I’ve ever had an issue with it, but it does get a bit wild when you have a function signature that takes a function and returns one, unless you clear it up with some types.

  func (s *Recv) foo(fn func(x any) err) func bar(y any) (*Recv, err)

As an exaggerated example. Easy to parse but not always easy to read at a glance.

I have to always add wildcards between func and the function name, because I can never know how the other developer has decided to specify the name of the receiver. This will always be a problem as far as grepping with primitive tools that don't parse the language.

Receivers are utterly idiotic. Like how could anyone with two working brain cells sign off on something like that?

If you don't want OOP in the language, but want people to be able to write thing.function(arg), you just make function(thing, arg) and thing.function(arg) equivalent syntax.

C# did this for extension methods and it Just Works. You just add the "this" keyword to a function in a pure-static class and you get method-like calling on the first param of that function.

If the function has to be modified in any way in order to grant permission to be used that way, then it is not quite "did this".

Equivalent means that there is no difference at the AST level between o.f(a) and f(o, a), like there is no difference in C among (a + i), a[i], i[a] and (i + a).

However, a this keyword is way better than making the programmers fraction off a parameter and move it to the other side of the function name.

JavaScript has multiple ways to define a function so you sort of lose that getting the actual definition benefit.

on edit: I see someone discussed that you can grep for both arrow functions and named function at the same time and I suppose you can also construct a query that handles a function constructor as well - but this does not really handle curried functions or similar patterns - I guess at that point one is letting the perfect become the enemy of the good.

Most people grepping know the code base and the patterns in use, so they probably only need to grep for one type of function declaration.

C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {. In order to get a complete list of symbols you need to preprocess it, this requires knowing what the compiler flags are. Nobody really knows what their compiler flags are because they are hidden between multiple levels of indirection and a variety of build systems.

> C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {.

That’s not really C; that’s a C-based DSL. The same problem exists with Lisp, except even worse, since its preprocessor is much more powerful, and hence encourages DSL-creation much more than C does. But in fact, it can happen with any language - even if a language lacks any built-in processor or macro facility, you can always build a custom one, or use a general purpose macro processor such as M4.

If you are creating a DSL, you need to create custom tooling to go along with it - ideal scenario, your tools are so customisable that supporting a DSL is more about configuration than coding something from scratch.

If your Lisp macro starts with a symbol whose name begins with def, and the next symbol is a name, then good old Exuberant Ctags will index it, and you get jump to definition.

Not so with DEFINE_FUNCTION(foo) {, I think.

  $ cat > foo.lisp
  (define-musical-scale g)
  $ ctags foo.lisp
  $ grep scale tags
  g       foo.lisp        /^(define-musical-scale g)$/;"  f

Exuberant Ctags is not even a tool from the Lisp culture. I suspect it is mostly shunned by Lisp programmers. Except maybe for the Emacs one, which is different. (Same ctags command name, completely different software and tag file format.)

Other languages have preprocessors or macro facilities too.

C's is very weak. Languages with more powerful preprocessors/macros than C's include many Lisp dialects, Rust, and PL/I. If you think everyone using a weak preprocessor is bad, wait until you see what people will do when you give them a powerful one.

Microfocus COBOL has an API for writing custom COBOL preprocessors in COBOL (the Integrated Preprocessor Interface). (Or some other language, if you insist.) I bet there are some bizarre abominations hidden in the bowels of various enterprises based on that ("our business doesn't just run on COBOL, it runs on our own custom dialect of COBOL!")

c's macro system is weak on purpose, based on, i suspect, bad experiences with m6 and m4. i think they thought it was easier to debug things like ratfor, tmg, lex, and (much later) protoc, which generate code in a more imperative paradigm for which their existing debugging approaches worked

i can't say i think they were wholly wrong; paging through compiler error messages is not my favorite part of c++ templates. but i have a certain amount of affection for what used to be called gasp, the gas macro system, which i've programmed for example to compute jump offsets for compiling a custom bytecode. and i think m4 is really a pathological case; most hairy macro systems aren't even 10% as bad as m4, due to a combination of several tempting but wrong design decisions. lots of trauma resulted

so when they got a do-over they eliminated the preprocessor entirely in golang, and compensated with reflection, which makes debugging easier rather than harder

probably old hat to you, but i just learned last month how to use x-macros in the c preprocessor to automatically generate serialization and deserialization code for record types (speaking of cobol): http://canonical.org/~kragen/sw/dev3/binmsg_cpp.c (aha, i see you're linking to a page that documents it)

I've always suspected the powerful macro facilities in Lisp are why it's never been very common - the ability to do proper macros means all the very smart programmers create code that has to be read like a maths paper. It's too bespoke to the problem domain and too tempting to make it short rather than understandable.

I like Rust (tho I have not yet programmed in it) but I think if people get too into macro generated code, there is a risk there to its uptake.

It's hard for smart programmers to really believe this, but the old "if you write your code as cleverly as possible, you will not be able to debug it" is a useful warning.

Yes, the usefulness of macros always has to be balanced against their cost. I know of only one codebase that does this particular thing though, Emacs. It is used to define Lisp functions that are implemented in C.

It's a common pattern for just about any binding of C-implementation to a higher-level language. Python has a similar pattern, and I once had to re-invent it from scratch (not knowing any of this) for a game engine.

Not JavaScript. Cool kids never write “function” any more, it’s all arrow functions. You can search for const, which will typically work, but not always (could be a let, var, or multi-const intializer).

Yes but that’s an anti pattern. Arrow functions aren’t there to look cool, they’re how you define lambdas / anonymous functions.

Other than that, functions should be defined by the keyword.

Functions and arrow functions have an important difference: arrow functions do not create their own `this`. If you're in a context where a nested function needs to maintain access to the outer function’s `this`, and you don't want to muck with `bind` or `call`, then you need an arrow function.

Anonymous functions don't have names. This makes it much harder to do things like profiling (just try to find that one specific arrow function in your performance profile flame graph) and tracing. Tools like Sentry that automatically log stack traces when errors occur become much less useful if every function is anonymous.

Interesting, it seems that the javascript runtime is smart enough detect this pattern and actually create a named function (I tried Chrome and Node.js)

    const foo = () => {}
    console.log( foo.name );

actually outputs 'foo', and not the empty string that I was expecting.

   const test = () => ( () => {} );
   const foo = test();
   console.log( foo.name );

outputs the empty string.

Is this behavior required by the standard ?

You're probably remembering how it used to work. This is the example I remember from way back that we shouldn't use because (aside from being unnecessary and weird) this function wouldn't have a name in stack traces:

  var foo = function() {};

Except nowadays it too does have the name "foo".

Not sure what you find not true about it. All named “function”s get hoisted just like “var”s, I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere. You’re probably thinking about

  const foo = function (){}

without its own name before (). These behave like expressions and cannot be hoisted.

> I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere

I haven't figured out if people consider this a best practice, but I love doing it. To me the list of called functions is a high-level explanation of the code, and listing all the definitions first just buries the high-level logic "below the fold". Immediately diving into function contents outside of their broader context is confusing to me.

I don’t monitor “best” practices, so beware. But in languages like C and Pascal I also had a habit of simply declaring all interfaces at the top and then grouping implementations reasonably. It also created a nice “index” of what’s in the file.

Hoisting also enables cross-imports without helper unit extraction headaches. Many hate js/ts at the “kids hate == and null” level but in reality these languages have a very practical design that wins so many rounds irl.

To me, arrow functions behave more like I would expect functions to behave. They don’t include all the magic bindings that the function keyword imparts. Feels more “pure” to me. Anonymous functions can be either function () {} or () => {}

I did, until I used them enough where I saw where they were useful.

The bad examples of arrow functions I saw initially were of:

1. Devs trying to mix them in with OOP code as a bandaid over OOP headahes (e.g. bind/this) instead of just not using OOP in the first place.

2. Devs trying to stick functional programming everywhere because they had seen a trivial example where a `.map()` made more semantic sense than a for/for-in/for-of loop. Despite the fact that for/for-in/for-of loops were easier to read for anything non-trivial and also had better performance because you had access to the `break`, `continue` and `return` keywords.

    > also had better performance because you had access to the `break`, `continue` and `return` keywords.

This is a great point.

One more: Debugging `.map()` is also much harder than a for loop.

I feel there are a few ways to invoke .map() in a readable way and many ways that make the code flow needlessly indirect.

Should be a judgment call, and the author needs to be used to doing both looping and mapping constructs, so that they are unafraid of the bit of extra typing needed for the loop.

Another benefit of using for instead of array fns is that it is easy to add await keyword should the fn become async.

But many teams will have it as a rule to always use array fns.

As an aside: It’s way less ergonomic, but you likely want `Promise.allSettled` rather than `Promise.all` as the first promise that throws aborts the rest.

It doesn’t really abort the rest, it just prioritizes the selection of a first catch-path as a current continuation. The rest is still thenable, and there’s no “abort promise” operation in general. There are abort signals, but it’s up to an async process to accept a signal and decide when/whether to check it.

Admittedly, I was being a bit hand-wavy and describing a bit more of how it feels rather than the way it is (I'm perpetually annoyed that promises can't be cancelled), but I was thinking of the code I've seen many times across many code bases:

    let results;
    try {
      results = await Promise.all(vals.map(someAsyncOp))
    } catch (err) {
      console.error(err)
    }

While you could pull that promises mapping into a variable and keep it thenable, 99% of the time I see the above instead. Promises have some rough edges because they are stateful, so I think it might be easier to recommend swapping that Promise.all for an Promise.allSettled, and using a shared utility for parsing the promise result.

I consider this issue akin to the relationship between `sort`, `reverse`, `splice`, the mutating operation APIs, and their non mutating counterparts `toSorted`, `toReversed`, `toSpliced`. Promise.all is kind of the mutating version of allSettled.

I don't like using them everywhere, but they're very handy for inline anonymous functions.

But it really pains me when I see

export const foo = () => {}

instead of

export function foo() {}

Thank you, that's something I also never have understood myself. For inline anonymous functions like callbacks they make perfect sense. As long as you don't need `this`.

But everywhere else they reduce readability of the code with no tangible benefit I am aware of.

But do they make much of a difference? You have always been able to write:

    myArray.sort(function(a,b){return a-b})

People for some reason treat this syntactic sugar like it gives them some new fundamental ability.

Oh Javascript would be much better if it could only be syntactic sugar...

`function(a,b){return a-b;}` is different from `(a,b) => a - b`

And `function diff(a,b) {return a-b;}` is different from `const diff(a,b) => a - b;`.

I wish javascript had a built-in or at least (defacto) default linter. Like go-fmt or rust fmt. Or clippy even.

One that could enforce these styles. Because not only is the export const foo = () {}

painful on itself, it will quite certainly get intermixed with the

function foo() {}

and then in the next library a

const foo = function() {}

and so on. I'd rather have a consistently irritating style, than this willy-nilly yolo style that the JS community seems to embrace.

They are miles away from `gofmt` or `rust fmt` or `cargo clippy` and so on.

It's not opinionated, but requiring you to form your own opinion or at least choose from a palette of opinions.

It requires effort to opt-in rather than effort to opt-out.

The community doesn't frown on code that's not adhering to the common standard or code that doesn't pass the "out of the box" linter.

So, if I have a typescript project with a tree of some 20 dependencies (which is, unfortunately, a tiny project), I'll have at least five styles of code when it browse through it. Some JS, some TS, some strictly linted with "no-bikeshedding", some linted with configs that are bigger than the codebase itself. Some linted with outdated. Many not linted at all. It's really a mess. Even if each of the 20 dependencies themselves are clean, beauties, the whole is an inconsistent mess.

Moreover the binding and lexical scope aspects supported by classic functions are amongst the worst aspects of the language.

Arrow functions are also far more concise and ergonomic when working with higher order functions or simple expressions

The main thing to be wary of with arrow functions is when they are used anonymously inline without it being clear what the function is doing at a glance. That and Error stack traces but the latter is exacerbated by there being no actual standard regarding Error.prototype.stack

Why do you want to reinforce that idea?

To me arrow functions mostly just decrease readability and makes them blend in too much, when it should be important distinction what is a function and what is not.

I'm not a javascript programmer, but I really like the arrow pattern from a distance exactly because it enforces that idea.

My experience is that newcomers are often thrown off and confused by higher order functions. I think partly because, well let's be honest they just are more confusing than normal functions, but I think it's also because languages often bind functions differently from everything else.

`const cool = () => 5`

Makes it obvious and transparent, that `cool' is just a variable where as:

`function cool() {return 5}`

looks very different from other variable bindings.

Since we're on the topic of higher order functions, arrow functions allow you to express function currying very succinctly (which some people prefer). This is a contrived example to illustrate the syntactical differences:

  const arrow = (a) => (b) => `${a}-${b}`
  
  function verbose(a) {
    return function (b) {
      return `${a}-${b}`
    }
  }
  
  function uncurried(a, b) {
    return `${a}-${b}`
  }
  
  const values = ['foo', 'bar', 'baz']
  values.map(arrow('qux'))
  values.map(verbose('qux'))
  values.map(uncurried.bind(null, 'qux'))
  values.map((b) => uncurried('qux', b))

> should be important distinction what is a function and what is not

code is to express logic clearly to the reader. We should assess it for that purpose, before assess for any derivative, secondary concern such as whether categories of things in code (function etc) visually pops out when you use some specific tool like vim, or grep. There are syntax highlighters for a reason. And maybe if grep sucks with code then build the proper tool for code searching, instead of writing code after the tool.

A simple heuristic I use is to use arrow functions for inline function arguments, and named "function" functions for all others.

One reason is exactly what the subject of discussion is here, it's easier to string-search with that keyword in front of the name, but I don't need that for trivial inline functions (whenever I do I make it an actual function that I declare normally and not inline).

Then there's the different handling of "this", depending on how you write your code this may be an important reason to use an arrow function in some places.

Yes JavaScript.

You can search for both: "function" and "=>" to find all function expressions and arrow function expressions.

All named functions are easily searchable.

All anonymous functions are throw away functions that are only called in one place so you don't need to search for them in the first place.

As soon as an anonymous function becomes important enough to receive a label (i.e. assigning it to a variable, being assigned to a parameter, converting to function expression), it has also become searchable by that label too.

The => is after the param spec, so you’re searching for foo.*=> or something more complex, but then still missing multiline signatures. This is very easy to get caught by in TypeScript, and also happens when dealing with higher-order functions (quite common in React).

Why are you searching for foo.=>

Are you searching through every function, or functions that have a very specific parameter?

And whatever you picked, why?

---------------------------------------------------------------

- If you're searching for every function, then there's no need to search for foo.=>, you only need to search for function and =>.

- If you're searching for a specific parameter, then just search for the parameter. Searching for functions is redundant.

---------------------------------------------------------------

Arrow function expressions and function expressions can both be named or anonymous.

Introducing arrow functions didn't suddenly make JavaScript unsearchable.

JavaScript supported anonymous functions before arrow function expressions were introduced.

Anonymous functions can only ever be:

- run on the spot

- thrown away

- or passed around after they've been given a label

Which means, whenever you actually want to search for something, it's going to be labelled.

So search for the label.

Regex is a universal tool.

Your special tool might not work on plattform X, fails for edge case - and you generally don't know how it works. With regex or simple string search - I am in control. And can understand why results show up, or investigate when they don't, but should.

> Your special tool might not work on plattform X

As always, people come out with the weirdest of excuses to not use actual tools in the 99.9999% of the cases when they are available, and work.

When that tools doesn't work, or isn't sufficient, use another one like fuzzy text search or regexps.

> and you generally don't know how it works.

Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?

Only in programming I see people deliberately avoid useful tools because

It’s you who sees it as excuses. If I have a screwdriver multitool, I don’t need another one which is for d10 only. It simply creates unnecessary clutter in a toolbox. The difference between definition and mention search for a function is:

  grion name
  vs
  grname

or for the current identifier, simply

gr

I could even make my own useful tools like “\[fvm]gr” for function, variable or field search and brag about it watching miserable ide guys from the high balcony, but ain’t that unnecessary as well.

When you specialize in one thing only, do what you want.

But I prefer tools, that I can use wherever I go. To not be dependant and chained to that environment.

"Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?"

Also yes, I do.

" people deliberately avoid useful tools because "

Well, or I did already changed tools often enough, to be fed up with it and rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

> When you specialize in one thing only, do what you want.

I specialize in one thing only: programming

> But I prefer tools, that I can use wherever I go.

Do you always walk everywhere, or do you use a tool available at the time, like cars, planes, bycicles, public transport?

> rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

Things like "fund symbol", "find usages", "find implementation" have been available in actual tools for close to two decades now.

I did not say I do not use what is avaiable, but this debate is about in general having your code in a shape that simply searching for strings work.

Simply searching for strings rarely works well as the codebase grows larger. Because besides knowing where all things named X are, you want to actually see where X is used, or where it's called from, or where it is defined.

With search you end up grepping the code twice:

- first grepping for the name

We're literally in a thread where people invent regexes for how to search the same thing (a function) defined in two different ways (as a function or as a const)

- secondly, manually grepping through search results deducing if it's relevant to what you're looking for

It becomes significantly worse if you want to include third-party libs in your search.

There are countless times when I would just Cmd+B/Cmd+Click a symbol in IDEA and continue my exploration down to Java's own libraries. There are next to zero cases when IDEA would fail to recognise a function and find its usages if it was defined as a const, not as a function. Why would I willingly deny myself these tools as so many in this thread do?

Its current year and IDEs still can’t remember how I just transformed the snippet of code and suggest to transform the rest of the files in the same way. All they can do in “refactor” menu is only “rename” and then some extract/etc nonsense which no one uses irl.

By using regexps I have an experience that opens many doors, and the fact that they aren’t automatic could make me sad, if only these doors weren’t completely shut without that experience.

No one is stopping you from using regexps in IDEs.

And you somehow manage to undersell the rename functionality in an IDE. And I've used move/extract functionality multiple times.

I do however agree that applicable transformations (like upgrading to new syntaxes, or ways of doing stuff as languages evolve) could be applied wholesale to large chunks of code.

When tools don't work or unsuitable, you use different tools.

And yet people are obsessed with never using useful tools in the first place because they can invent scenarios when this tool doesn't work. Even if these scenarios might never actually come up in their daily work.

Not to move the goal posts too much, but when I am searching a huge Python or Java code base from IntelliJ, I use a mixture of symbol and text search. One good thing about text search, you get hits from comments.

I want to talk to the developer who considers greppability when deciding whether to use the "function" keyword but requires his definitions to be greppable by distancing them from their call locations. I just have a few questions for him.

I used to define functions as `funcname (arglist)`

And always call the function as `funcname(args)`

So definitions have a space between the name and arg parentheses, while calls do not. Seemed to work well, even in languages with extraneous keywords before definitions since space + paren is shorter than most keywords.

Now days I don’t bother since it really isn’t that useful especially with tags or LSP.

I still put the return type on a line of its own, not for search/grep, but because it is cleaner and looks nice to me—overly long lines are the ugliest of coding IMO. Well that and excessive nesting.

Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in

That’s why in my personal projects I follow classic “type\nname” and grep with “^name\>”.

looks ugly

Single line definitions with long, irregular type names and unaligned function names look ugly. Col 1 names are not only greppable but skimmable. I can speedscroll through code and still see where I am.

Yet you reply to an article that defines functions as variables, which I've seen a lot of developers do usually for no good reason at all.

To me, that's a much common and worse practice with regards to greppability than splitting identifiers using string which I haven't seen much in the wild.

In the bygone days of ctags, C function definitions included a space before opening parenthesis, while function calls never had that space. I have a hard time remembering that modern coding styles never have that space and my IDE complains about it. (AFAIK, the modern gtags doesn't rely on that space to determine definitions.) Even without *tags, the convention made it easy to grep for definitions.

In terms of C, that's one reason I prefer the BSD coding style:

int

foo(void) { }

vs the Linux coding style:

int foo(void) { }

The BSD style allows me to find function definitions using git grep ^foo.

Although in rust, function like macros make it super hard to trace code. I like them when I am writing the code and hate then when I have to read others macros.

Those also make your language easier to parse, and to read.

Many people insist that IDEs make the entire point moot, but that's the kind of thing that make IDEs easier to write and debug, so I disagree.

One thing which works for C is to search something like `[a-z] foo$.+$ \{` assuming that spacing matches the coding style, often the shorter form `[a-z] foo\(` works well, which tries to ensure there is a type definition and bin assignment or something before name. Then there is only a handful false positives.

> so the best you can do is search for the name

This is why in C projects libs go in "lib/" and sources go in "src/". If your header files have the same directory structure as libs, then "include/" is a also a decent way to find definitions.

Not sure this is very true for Common Lisp. Classic example are accessor functions where the generic function is created by whichever class is defined first and the method where the class is defined. Other macros will construct new symbols for function names (or take them from the macro arguments).

That’s true, but I regard it as fairly minor. Accessor functions don't have any logic in them, so in practice you don’t have to grep for them. But it can be confusing for new players, since they don't know ahead of time which ones are accessors and which are not.

> Meanwhile C lacks any such keyword

It's a hassle. But not the end of the world.

I usually search for "doTheThing$.+?$ \{" first.

If I don't get a hit, or too many hits I move to "doTheThing$[^$]*?\) \{" and so on.

C has "classical" tooling like Cscope and Exuberant Ctags. The stuff works very well, except on the odd weird code that does idiotic things that should not be done with preprocessing.

Even for Lisp, you don't want to be grepping, or at least not all the time for basic things.

For TXR Lisp, I provide a program that will scan code and build (or add to) your tags file (either a Vim or Emacs compatible one).

Given

  (defstruct point ()
    x
    y)

it will let your editor jump to the definition of point, x and y.

Do people really use text search for this rather than an IDE that parses all of the code and knows exactly where each declaration is, able to instantly jump to them from a key press on any usage...? Wild.

Yes. Not everyone uses or likes an IDE. Also, when you lean on an IDE for navigation, there is a tendency to write more complicated code, since it feels easy to navigate, you don't feel the pain.

Python is the only one mentioned that “actually works” without endless exceptions to the rule in the normal case. The ones mentioned (Rust/Javascript/Lisp/Go) all have specific syntax that is commonly enough used which makes it harder to search. Possible, absolutely, but still harder.

I'd say Python works well at greppability because community conventions generally discourage concealing certain kinds of definitions (e.g. function definitions are usually "def whatever").

However, that's just convention. Lots of modules do metaprogramming tricks that obscure greppability, which can be a pain. This is particularly acute when searching for code that is "import-time polymorphic"--that is, code which picks one of several implementations for a piece of functionality at import time at the module scope. That frequently ends up with some hanky-panky a la "exported_function_name = _implementation1 if platform_supported else _implementation2" at the module scope.

While sometimes annoying, that type of thing is usually done for understandable reasons (picking an optimized/platform-supported implementation of an interface--think select or selectors in the stdlib, or any pypi implementation of filesystem monitoring using fsnotify/fanotify/kqueue/fsevents/ReadDirectoryChangesW). Additionally, good type annotations help with greppability, though they can't fully mitigate this issue.

Much less defensible in Python is code that abuses locals/globals to indirect symbol access, or code that abuses star imports to provide interfaces/implementation switching.

Those, fortunately, are rare, but the elephant in the "no greppability ever" room is not: getattr bullshit in OO code is so often utterly obscure, unnecessary and terrible. And it's distressingly common on PyPi. At first I thought this was Ruby's encouragement of method_missing in the bad old days bleeding into the Python community, but the number of programmers for whom getattr magic is catnip seems to be disproportionate to the number of folks with Ruby experience, and, more concerningly, seems to me to be growing over time.

That’s right, not everyone uses an LSP. Nothing wrong with LSPs, very useful tools. I use ripgrep, or plain grep if I have to, far more often than an LSP.

Working with legacy code — the scenario the author describes — I often can’t install anything on the server.

Rust though does lose some of those points by more or less forcing[1] snake_case. It's really annoying to navigate bindings which are converted from camelCase.

I don't care which case is used. It's a trivial superficial thing, and tribal zealotry about such doesn't reflect well on the language and community.

[1] The warnings can be turned off, but in some cases it requires ugly hacks, and the community seems to be actively hostile to making it easier)

The Rust community is no more zealous about naming conventions than any other language which has naming conventions. Perhaps you're arguing against the concept of naming conventions in general, but that's not a Rust thing, every language of the past 20 years suggests naming conventions if for no other reason than every language provides a standard library which needs to follow some sort of naming conventions itself. Turning off the warnings emitted by the Rust compiler takes two lines of code, either at the root of the crate or in the crate manifest.

I've yet to encounter another compiler that warns about naming conventions, by default at least. So at least it's most enforced zealotry I've encountered.

Yes, it can be turned off. But for e.g. bindgen generated code it was not trivial to find out.

The Rust compiler doesn't produce warnings out of zealotry, but rather as a consequence of pre-1.0 historical decisions. Note that Rust doesn't use any syntax in pattern matching contexts to distinguish between bindings and enum variants. In pre-1.0 versions of Rust, this created footguns where an author might think they were matching on an enum, but the compiler was actually parsing it as a catch-all binding that would cause any following match arms to never be executed. This was exacerbated by the prevailing naming conventions of the time (which you can see in this 2012 blog post: https://pcwalton.github.io/_posts/2012-06-03-maximally-minim... (note the lower-cased enum variants)). So at some point the naming conventions were changed in an attempt to prevent this footgun, and the lint was implemented to nudge people over to the new conventions. However, as time went on the footgun was otherwise fixed by instead causing the compiler to prioritize parsing enum variants rather than bindings, in conjunction with other errors and warnings about non-exhaustive patterns and dead code (which are all desirable in their own right). At this point it's mostly just vestigial, and I highly doubt that anybody really cares about it beyond "our users are accustomed to this warning-by-default, so they might be surprised if we stopped doing this".

Ah, thanks for the info! I do think this default does have some ramifications, especially in that binding casings are typically changed due to it even for "non-native" wrappers which I find materially makes things more difficult.

（评论） (comments)

（评论）
(comments)