I enjoy shocking people by telling them I don’t use LLMs.
This isn’t true, but it’s morally true for the reference class I’m in (people who wrote a book about em, 2024 AI PhD, ML twitter member in good standing, trying to do intellectual work on a deadline).
Attack ships on fire off the shoulder of Orion bright as magnesium
I was there for Keras 1.0, and trained 100-class image recognisers on a Sandy Bridge Xeon CPU (and ok later a single Titan GPU, once Procurement got their shit together).
I was there when GANs started working. I was there when GANs "stopped" working.
I was there when OpenAI was a pure RL lab, and when OpenAI was a lab and not a company.
I was there when BERT hit and sounded the death knell for entire subfields of NLP.
I was there when GPT-2 prompted with "tl;dr" destroyed hundreds of corporate data science projects on summarisation.
I've spent a hundred hours with various of them, but almost entirely in roboanthropologist mode ("look at these fascinating primitives!"; "I wonder what new exciting publishable diseases my patient will display today").
I was also there when the field and the industry got bitcoinified and run through with idiots, grifters, and worse in a 50:1 ratio with real people. The "labs" (companies) don't help matters by eliding between their sensible beliefs about where the beasts will be in a few years and their sensible beliefs about where they are at the moment. The "researchers" (opportunist hillclimbers) didn't help matters when they took ML evaluation to the brink of pseudoscience. So it's hard to forgive the industry-slash-fandom when it exaggerated capabilities every single week for the last 200 weeks.
It’s not that I’m ignorant; it’s something else. Every time a new model comes out, I say “ok Gav it’s finally time to cyborg up” – and every time, it confidently makes an appalling error within 5 minutes and I completely lose my appetite.
But people I enormously respect have been using them happily for real thinking for a long time, sometimes two full years, and the resulting output is good and deep.
Something’s gotta give.
Why not??
- I like writing so much that reading and improving bad writing can be more effort than doing it myself.
- me not needing to bullshit other humans that often. (I do write quite a few letters of recommendation at this point, but sending out raw slop would be a disservice to my students; the supposed prose would make them sound exactly like every one of their hapless slop-dressed peers. It’s also an insult to a fellow academic.)
- me not writing much code atm. (I get that the fall in marginal cost means my demand should grow, but like I already have 4 engineers on staff.)
- me already knowing the basics of many many things. Maybe this is complacency; there’s lots of things I used to know well but have half-forgotten. But also: the received view in non-STEM fields is quite often essentially false, or a lie-to-children oversimplification. Take o3’s account of the origins of punk music. This is the received view of received views, and I honestly don’t know why I’d want that. The only function I can think of is to pretend to be something you’re not at a party or a seminar, and I don’t want to pretend. OK charitably, you do need some starting point, even if it’s false and cliched. But I mostly don’t anymore.
- me needing precision and high confidence to learn. I encourage you to start by prompting it with a field you know intimately - you will be disappointed. (Not to pick on Andy, but the generated o3 textbook he gives as an example use case is a bit weak. In one rollout it got the date of CNNs wrong by >10 years and omitted a key caveat of the Minsky-Papert XOR result - that the proof was for single-layer perceptrons; another rollout got LSTMs off by 20 years and seems to have confused RNNs and ResNets.) Karpathy uses it for looking up descriptive statistics, which seems like a bad idea.
- I am already too imprecise for my own liking; building in LLMs would make me even worse.
- me being well-calibrated about a lot of things, way more than the current machines.
- me enjoying large-scale reading and exploration and not wanting to offload it
- them not being actually able to read long things properly despite what people tell you
- the valuable thing that I do at work is not “produce words”, and not “skim thousands of words”, but “think in new ways” and “be calibrated”. The machines of 2025 cannot help me with this except as a foil, a stooge, a practice dummy for what not to say.
- me working at a level where most of what I want to know is in the training corpus 0-10 times, i.e. maybe below the pretraining extraction frequency (but this is a moving target and bigger models may get it)
- me being very precious about style, having a disgust reaction to bad style
- me having a disgust reaction in response to being bullshitted - which I endorse and which keeps me strong.
- their incredibly annoying laziness training, e.g. where they process like 12 rows of a 1000 row dataset and then say “the rest of the rows have been omitted for brevity” or whatever
- me knowing regex very well
- me worrying about being deskilled by using them. (Later I will also worry about “cognitive security”.)
- me hating voice mode
- me having very smart collaborators
- me having enough human friends
- me disliking talk therapy
So you can explain the anomaly by me not treading old ground or needing the received view; being in love with writing (and my own style); not being a strong-enough verifier to use weak signals easily; and not writing much code.
Self-critique
- me being impatient and so not doing the multi-turn stuff which is often necessary
- me not being that good at delegating in general. I don't get on with human assistants or tutors either.
- me getting a lil old and so inflexible
- me wishing they didn't work
- me not actually wanting some kinds of work to be easy
- maybe minor anchoring on GPT-3.5 capabilities, like a parent who still underestimates and tries to do too much for their child
- disgust reaction at them harbinging the end of life as we know it, ruining the awe of being present at the nativity of a new type of mind. (I feel much the same about ML engineering.)
Anyway:
How I use them
In order of frequency x usefulness:
- Help remembering a term on the tip of my tongue (“what do you call it when…”)
- Working out what words to google in a new field (“list the names of some natural anticancer mechanisms”)
- Hit-and-miss for unpacking acronyms and getting in the loop. I’m too online to need this that often and the remainder is often coined after their knowledge cutoff.
- To get around sycophancy I present my ideas as someone else’s. (“Someone just claimed hypothesis X. What’s the evidence for and against this?”)
- Using OpenRouter to ask 4 models at once is good and makes the bullshit less odious - except that the UI absolutely obviously should be one column per model, rather than pasting the different responses under each other. Lemme just open Stylus to edit the CSS…
- The blank page problem is fixed; in the 10% of cases where I lack inspiration to begin, I have the bot produce something and let my hatred for RLHF prose inspire me: I cannot rest until I edit the slop away. With the same strictures, it’s also been very good for getting started with writing fiction. This is a service a human amanuensis could not offer, since I wouldn’t feel free to destroy and impugn their work so completely. (However, I think by writing, and I worry that critique and editing is not the proper kind of thinking. But I still do lots of real stuff.)
- Semantic search through a corpus (“give me everything related to AI timelines”; “give me passages which could be influenced by Nietzsche”)
- For declared adversaries (like people who are breaching contracts) I use “Write a legal response in the style of Patrick Mackenzie in Dangerous Professional mode” or “Explain in the style of Zvi Mowshowitz”.
- Formatting: validating and fixing JSON; JSON to CSV; . I prefer regex and haven’t yet needed to ask the beast’s help composing any regexes.
- Deep Research is kinda useful, sure. But if Google was as good as it was in 2012 I wouldn’t use DR - and also if my employer didn’t pay for it I wouldn’t. (I intensely resent them diluting the word “deep” and the word “research” to mean stupidly skimming and summarising existing texts. I would probably use it twice as often if I didn’t have to play along with this degradation. Actually let me just write a Stylus rule to rename it in the UI.)

- Ollama: debugging internet connections when you don’t have one.
- Code
- Matplotlib. They got better than me in about 2023, despite me learning this godforsaken library about 9 years ago.
- Various Cloudflare, WSL, Docker and Ruby version hell headaches. I use these technologies a few times a year and will never learn them properly. LLM outputs rarely work first time but are still helpful.
- Claude artefacts for plotting and little interactives are very cool but you have to watch the results closely; it’s essentially partially blind and often messes up scales, positioning, axes.
- Automatically scoring the relevance of other LLM outputs for evals in a research paper. (It turns out that this is not that reliable but we do it anyway.)
- For translation I have the Google Translate shortcut in my muscle memory, but that’s basically a specialised LLM now.
- I stopped reading papers after my PhD. I’m dubious about using LLMs as a replacement for hard reading but in fairness most papers don’t deserve anything better.
- I’m very happy with strong misreading, in which one develops an entirely new view when reading someone else. Seems like LLMs could help me do this, by producing a weak misreading of a confusing writer which I then misread properly.
- I haven’t yet used it as a mechanical editing pass but there’s enough little typos on this site that I will. I will also try a separate developmental edit (Claude 3.6 mind you) but expect to reject 90% of its suggestions.
- I don’t have many medical issues but would happily use it for niggling things or a second opinion. This is as much to do with the weakness of medicine as the strength of AI.
- I don’t use memory because I don’t want it to flatter me.
EDIT: Add Herbie to the list of productive people who use them for ideas:
I do personally find LLMs help me think in new ways — much of my writing is about thinking of new framings/ways of looking at a problem. I find if I spend some time setting up a detailed prompt (e.g. import gdoc, custom system prompt, etc) then models will reliably give me a list of ideas, with some I hadn’t thought of. So currently for writing I mostly use these models for coming up with the actual concepts behind the piece, rather than the writing itself, and find them pretty good!
(“New to me” or even “not new to me but I somehow overlooked it” are indeed often good enough.)
(Obviously there’s lots of amazingly useful non-LLM AI too, like Google Lens, automatic song ID and history, MathPix, diffusion.)
Brutal system prompts help a bit.
Skill issue
ppl tend to use it like a vending machine when they should be using it like a second cortex. they prompt instead of dialoguing. they extract instead of co-evolving.
Anyway I’m not too proud to think I might be doing it wrong. (For instance, I’m overdue a sojourn into the base model Zones.)
Except… I have a powerful urge to John Henry my way through this age. Let the heavens fall, but find me at my desk. But I doubt I am that strong, and they will improve.

(This is wrong.)
See also
Comments
such auspicious timing… i've not gotten much use with llms when it comes to writing and this articulates what i feel quite well. maybe in a few monthsunfortunately i am a software engineer by trade and i supposedly write a lot of code and i have been skill issuing on llm usage there, so perhaps i am, in fact, a luddite
i have found llms most useful for the pop culture clues in puzzlehunts
Niels commented on 08 April 2025 :
> me getting a lil old and so inflexiblecan confirm it's not this. I'm << 30 and use them in much the same way for much the same reasons.