Prompting LLMs is not engineering

原始链接: https://dmitriid.com/prompting-llms-is-not-engineering

As of July 2025, the industry's fascination with "prompt engineering" (now often called "context engineering/prompting/manipulation") is largely misguided. It's essentially reverse-engineering black box AI models with unknown parameters like training data, constraints, and compute availability, making results unpredictable. The "better result" touted by "context engineers" lacks clear criteria and is vulnerable to external factors like fluctuating compute resources, rendering previous tricks ineffective. Claims often lack rigorous evidence, akin to homeopathy. While highly specific prompts can work, they demand significant manual effort. The supposed advancements like "chain-of-thought" prompting have proven effective only in narrow, highly specific scenarios, not broadly applicable as initially claimed. New techniques for models like OpenAI o3 and Gemini 2 Pro are essentially new versions of snake oil with no guarantee of outcome. The entire practice relies more on faith and chance than on genuine engineering principles.

This Hacker News thread discusses an article claiming "Prompting LLMs is not engineering." The debate centers on the definition of "engineering" and whether prompting qualifies. Some argue that systematic experimentation and evaluation of prompts resemble engineering, while others consider it more akin to art or "reverse-engineering a black box." Arguments against prompting as engineering highlight the inconsistencies of LLM outputs and the lack of deterministic control. Concerns are also raised about diluting the term "engineer," which traditionally implied professionalism and accountability. Some commentators propose that "product designer" or "troubleshooter" are more fitting titles. Conversely, proponents argue that prompt engineering involves influencing models to achieve desired results and requires a statistical approach with benchmark datasets. The analogy to "social engineering" is made, where influencing behavior is the goal. The thread also acknowledges that understanding the underlying LLM architecture helps improve prompting results. Ultimately, the discussion touches on the broader question of how the rise of AI is changing traditional roles and skills in tech.

原文

With the proliferation of AI models and tools, there's a new industry-wide fascination with snake oil remedies called "prompt engineering".

To put it succinctly, prompt engineering is nothing but an attempt to reverse-engineer a non-deterministic black box for which any of the parameters below are unknown:

training set
weights
constraints on the model
layers between you and the model that transform both your input and the model's output that can change at any time
availability of compute for your specific query
and definitely some more details I haven't thought of

"Prompt engineers" will tell you that some specific ways of prompting some specific models will result in a "better result"... without any criteria for what a "better result" might signify. Whereas it's enough for users in the US to wake up for free/available compute to go down and for all models to get significantly dumber than just an hour prior regardless of any prompt tricks.

Most claims about prompts have as much evidence as homeopathy. When people actually even the tiniest bit of rigorous examination, most claims by prompt "engineers" disappear like the morning dew. For example, prior to the new breed of "thinking" models, chain-of-thought queries were touted as great, amazing, awe-inducing. Sadly, in reality they only improved anything for very narrow hyperspecific queries and had no effect on broader queries even if the same techniques could be applied to them:

https://arxiv.org/pdf/2405.04776
Very specific prompts are more likely to work, but they can require significantly more human labor to craft. Our results indicate that chain of thought prompts may only work consistently within a problem class if the problem class is narrow enough and the examples given are specific to that class

Now that that the models have progressed to OpenAI o3, and Google Gemini 2 Pro, prompt "engineering" has also progressed to Rules for AI and large context windows and other snake oil remedies that are as effective and deterministic as previous ones.

In reality these are just shamanic rituals with outcomes based on faith, fear, or excitement. Engineering it is not.