我如何避免使用大型语言模型

我如何避免使用大型语言模型
How I Don't Use LLMs

这位人工智能专家，尽管从该领域早期就深耕其中，却幽默地声称自己避免使用大型语言模型（LLM），虽然承认偶尔也会用。他们见证了各种人工智能潮流的兴衰，对夸大的说法持怀疑态度，经常发现LLM会产生令人沮丧的错误。他们列举了诸多犹豫的理由，包括热爱写作、对精确性的需求、现有的知识体系、厌恶被误导以及对技能退化的担忧。然而，他们也承认LLM的特定用途：记忆术语、集思广益搜索关键词、解释缩略词、规避奉承、克服写作障碍、语义搜索、调试、策划和翻译。他们也欣赏LLM以新的方式探索新概念的能力。他们怀疑自己可能使用了错误的LLM方法，并且早就应该尝试一下基础模型了。作者苦恼于LLM可能削弱其技能的潜力，并怀念人工智能成为关注中心之前的时光。

这篇 Hacker News 帖子讨论了一篇文章，文章解释了作者为什么不使用大型语言模型 (LLM)。一个关键点是，LLM 通过肤浅的总结稀释了“深度”和“研究”等词的含义，其他一些人认为“人工智能”一词也被同样贬低了，这与他们的观点不谋而合。评论者分享了他们避免使用 LLM 的原因，包括 LLM 在解决复杂编码难题时无助的实例。一位评论者对作者的理由提出了质疑，认为每个人都知道 LLM 会出错，关键在于学习如何有效地使用它们。他们认为作者对 LLM 使用的理解可能仅限于为他人生成文本，忽略了其更广泛的潜力。他们进一步认为作者可能对其自身的知识过于自信，并强调应该将 LLM 与其他来源结合使用以进行验证。这位评论者总结道，当其他人正在积极改进提示以获得更好的结果时，不断抱怨 LLM 的错误是没有成效的。

（评论） 2025-03-27

（评论） 2024-06-13

（评论） 2024-03-10

（评论） 2024-06-09

原文

I enjoy shocking people by telling them I don’t use LLMs.

This isn’t true, but it’s morally true for the reference class I’m in (people who wrote a book about em, 2024 AI PhD, ML twitter member in good standing, trying to do intellectual work on a deadline).

Attack ships on fire off the shoulder of Orion bright as magnesium

I was there when GBMs still beat neural networks on tabular data.
I was there for Keras 1.0, and trained 100-class image recognisers on a Sandy Bridge Xeon CPU (and ok later a single Titan GPU, once Procurement got their shit together).
I was there when GANs started working. I was there when GANs "stopped" working.
I was there when OpenAI was a pure RL lab, and when OpenAI was a lab and not a company.
I was there when BERT hit and sounded the death knell for entire subfields of NLP.
I was there when GPT-2 prompted with "tl;dr" destroyed hundreds of corporate data science projects on summarisation.
I've spent a hundred hours with various of them, but almost entirely in roboanthropologist mode ("look at these fascinating primitives!"; "I wonder what new exciting publishable diseases my patient will display today").

I was also there when the field and the industry got bitcoinified and run through with idiots, grifters, and worse in a 50:1 ratio with real people. The "labs" (companies) don't help matters by eliding between their sensible beliefs about where the beasts will be in a few years and their sensible beliefs about where they are at the moment. The "researchers" (opportunist hillclimbers) didn't help matters when they took ML evaluation to the brink of pseudoscience. So it's hard to forgive the industry-slash-fandom when it exaggerated capabilities every single week for the last 200 weeks.

It’s not that I’m ignorant; it’s something else. Every time a new model comes out, I say “ok Gav it’s finally time to cyborg up” – and every time, it confidently makes an appalling error within 5 minutes and I completely lose my appetite.

But people I enormously respect have been using them happily for real thinking for a long time, sometimes two full years, and the resulting output is good and deep.

Something’s gotta give.

Why not??

I like writing so much that reading and improving bad writing can be more effort than doing it myself.
me not needing to bullshit other humans that often. (I do write quite a few letters of recommendation at this point, but sending out raw slop would be a disservice to my students; the supposed prose would make them sound exactly like every one of their hapless slop-dressed peers. It’s also an insult to a fellow academic.)
me not writing much code atm. (I get that the fall in marginal cost means my demand should grow, but like I already have 4 engineers on staff.)
me already knowing the basics of many many things. Maybe this is complacency; there’s lots of things I used to know well but have half-forgotten. But also: the received view in non-STEM fields is quite often essentially false, or a lie-to-children oversimplification. Take o3’s account of the origins of punk music. This is the received view of received views, and I honestly don’t know why I’d want that. The only function I can think of is to pretend to be something you’re not at a party or a seminar, and I don’t want to pretend. OK charitably, you do need some starting point, even if it’s false and cliched. But I mostly don’t anymore.
me needing precision and high confidence to learn. I encourage you to start by prompting it with a field you know intimately - you will be disappointed. (Not to pick on Andy, but the generated o3 textbook he gives as an example use case is a bit weak. In one rollout it got the date of CNNs wrong by >10 years and omitted a key caveat of the Minsky-Papert XOR result - that the proof was for single-layer perceptrons; another rollout got LSTMs off by 20 years and seems to have confused RNNs and ResNets.) Karpathy uses it for looking up descriptive statistics, which seems like a bad idea.
- I am already too imprecise for my own liking; building in LLMs would make me even worse.
me being well-calibrated about a lot of things, way more than the current machines.
me enjoying large-scale reading and exploration and not wanting to offload it
them not being actually able to read long things properly despite what people tell you
the valuable thing that I do at work is not “produce words”, and not “skim thousands of words”, but “think in new ways” and “be calibrated”. The machines of 2025 cannot help me with this except as a foil, a stooge, a practice dummy for what not to say.
me working at a level where most of what I want to know is in the training corpus 0-10 times, i.e. maybe below the pretraining extraction frequency (but this is a moving target and bigger models may get it)
me being very precious about style, having a disgust reaction to bad style
me having a disgust reaction in response to being bullshitted - which I endorse and which keeps me strong.
their incredibly annoying laziness training, e.g. where they process like 12 rows of a 1000 row dataset and then say “the rest of the rows have been omitted for brevity” or whatever
me knowing regex very well
me worrying about being deskilled by using them. (Later I will also worry about “cognitive security”.)
me hating voice mode
me having very smart collaborators
me having enough human friends
me disliking talk therapy

So you can explain the anomaly by me not treading old ground or needing the received view; being in love with writing (and my own style); not being a strong-enough verifier to use weak signals easily; and not writing much code.

Self-critique

Some other reasons I might be bad at this (which I don't assert because I can't see myself so easily):

me being impatient and so not doing the multi-turn stuff which is often necessary
me not being that good at delegating in general. I don't get on with human assistants or tutors either.
me getting a lil old and so inflexible
me wishing they didn't work
me not actually wanting some kinds of work to be easy
maybe minor anchoring on GPT-3.5 capabilities, like a parent who still underestimates and tries to do too much for their child
disgust reaction at them harbinging the end of life as we know it, ruining the awe of being present at the nativity of a new type of mind. (I feel much the same about ML engineering.)

Anyway:

How I use them

In order of frequency x usefulness:

Help remembering a term on the tip of my tongue (“what do you call it when…”)
Working out what words to google in a new field (“list the names of some natural anticancer mechanisms”)
Hit-and-miss for unpacking acronyms and getting in the loop. I’m too online to need this that often and the remainder is often coined after their knowledge cutoff.
To get around sycophancy I present my ideas as someone else’s. (“Someone just claimed hypothesis X. What’s the evidence for and against this?”)
Using OpenRouter to ask 4 models at once is good and makes the bullshit less odious - except that the UI absolutely obviously should be one column per model, rather than pasting the different responses under each other. Lemme just open Stylus to edit the CSS…
The blank page problem is fixed; in the 10% of cases where I lack inspiration to begin, I have the bot produce something and let my hatred for RLHF prose inspire me: I cannot rest until I edit the slop away. With the same strictures, it’s also been very good for getting started with writing fiction. This is a service a human amanuensis could not offer, since I wouldn’t feel free to destroy and impugn their work so completely. (However, I think by writing, and I worry that critique and editing is not the proper kind of thinking. But I still do lots of real stuff.)
Semantic search through a corpus (“give me everything related to AI timelines”; “give me passages which could be influenced by Nietzsche”)
For declared adversaries (like people who are breaching contracts) I use “Write a legal response in the style of Patrick Mackenzie in Dangerous Professional mode” or “Explain in the style of Zvi Mowshowitz”.
Formatting: validating and fixing JSON; JSON to CSV; . I prefer regex and haven’t yet needed to ask the beast’s help composing any regexes.
Deep Research is kinda useful, sure. But if Google was as good as it was in 2012 I wouldn’t use DR - and also if my employer didn’t pay for it I wouldn’t. (I intensely resent them diluting the word “deep” and the word “research” to mean stupidly skimming and summarising existing texts. I would probably use it twice as often if I didn’t have to play along with this degradation. Actually let me just write a Stylus rule to rename it in the UI.)

Ollama: debugging internet connections when you don’t have one.
Code
- Matplotlib. They got better than me in about 2023, despite me learning this godforsaken library about 9 years ago.
- Various Cloudflare, WSL, Docker and Ruby version hell headaches. I use these technologies a few times a year and will never learn them properly. LLM outputs rarely work first time but are still helpful.
- Claude artefacts for plotting and little interactives are very cool but you have to watch the results closely; it’s essentially partially blind and often messes up scales, positioning, axes.
Automatically scoring the relevance of other LLM outputs for evals in a research paper. (It turns out that this is not that reliable but we do it anyway.)
For translation I have the Google Translate shortcut in my muscle memory, but that’s basically a specialised LLM now.
I stopped reading papers after my PhD. I’m dubious about using LLMs as a replacement for hard reading but in fairness most papers don’t deserve anything better.
I’m very happy with strong misreading, in which one develops an entirely new view when reading someone else. Seems like LLMs could help me do this, by producing a weak misreading of a confusing writer which I then misread properly.
I haven’t yet used it as a mechanical editing pass but there’s enough little typos on this site that I will. I will also try a separate developmental edit (Claude 3.6 mind you) but expect to reject 90% of its suggestions.
I don’t have many medical issues but would happily use it for niggling things or a second opinion. This is as much to do with the weakness of medicine as the strength of AI.
I don’t use memory because I don’t want it to flatter me.

EDIT: Add Herbie to the list of productive people who use them for ideas:

I do personally find LLMs help me think in new ways — much of my writing is about thinking of new framings/ways of looking at a problem. I find if I spend some time setting up a detailed prompt (e.g. import gdoc, custom system prompt, etc) then models will reliably give me a list of ideas, with some I hadn’t thought of. So currently for writing I mostly use these models for coming up with the actual concepts behind the piece, rather than the writing itself, and find them pretty good!

(“New to me” or even “not new to me but I somehow overlooked it” are indeed often good enough.)

(Obviously there’s lots of amazingly useful non-LLM AI too, like Google Lens, automatic song ID and history, MathPix, diffusion.)

Brutal system prompts help a bit.

Skill issue

ppl tend to use it like a vending machine when they should be using it like a second cortex. they prompt instead of dialoguing. they extract instead of co-evolving.

— signulll

Anyway I’m not too proud to think I might be doing it wrong. (For instance, I’m overdue a sojourn into the base model Zones.)

Except… I have a powerful urge to John Henry my way through this age. Let the heavens fall, but find me at my desk. But I doubt I am that strong, and they will improve.

(This is wrong.)