未来的一切都是谎言,我想:新工作
The Future of Everything Is Lies, I Guess: New Jobs

原始链接: https://aphyr.com/posts/419-the-future-of-everything-is-lies-i-guess-new-jobs

## 人工智能时代的工作新图景 本文探讨了随着机器学习(ML)日益融入日常生活而出现的新工作岗位。未来的工作不会完全取代人类,而是将围绕人与人工智能系统之间的*接口*展开。 预计会出现新的专业方向,包括擅长提示以获得有效响应的**“LLM咒语师”**、专注于质量控制以捕捉人工智能错误的**“流程工程师”**(在法律等领域尤其重要),以及对机器学习不可预测的行为进行建模和控制的**“统计工程师”**。 维护数据质量至关重要,可能需要依赖 2023 年之前的、未受污染的数据源和专业的**“模型训练师”**——甚至聘请特定历史领域的专家。 然而,这种转变也引发了担忧。预计会出现**“肉盾”**——那些为人工智能失败承担责任的个人,以及通过低工资和不稳定的工作条件进行潜在剥削的情况。 理解人工智能系统*为何*会失败,将需要**“占卜师”**来分析模型行为并提供解释,可能用于法律或调查目的。 最终,文章表明,未来人类的专业知识将 paradoxically *更*有价值,而不是更少,以便驾驭和减轻日益强大的人工智能带来的风险。

最近一篇发表在Hacker News上的帖子,标题为“未来的一切大概都是谎言”,提出了一套人工智能时代新兴工作角色的分类。作者列出了诸如“咒语师”(提示工程师)和“占卜师”(解读人工智能内部状态)等角色,但评论者质疑过度依赖人类来完成这些任务的实用性,认为大型语言模型更适合许多工作。 讨论很快被用户报告网站因《在线安全法案》在英国无法访问的消息所主导,随后出现了大量存档链接的发布以及关于为什么静态网站需要这些链接的提问。一位评论员表达了对这种反复出现模式的沮丧。 其他讨论点包括对文章中使用的非人化语言(“肉体”与“人类”)的担忧,以及对科技界历史上倾向于偏爱机器而非有机生命的观察。 这篇文章是作者在Hacker News上每日分享的10篇系列文章之一。
相关文章

原文
Table of Contents

This is a long article, so I'm breaking it up into a series of posts which will be released over the next few days. You can also read the full work as a PDF or EPUB; these files will be updated as each section is released.

Previously: Work.

As we deploy ML more broadly, there will be new kinds of work. I think much of it will take place at the boundary between human and ML systems. Incanters could specialize in prompting models. Process and statistical engineers might control errors in the systems around ML outputs and in the models themselves. A surprising number of people are now employed as model trainers, feeding their human expertise to automated systems. Meat shields may be required to take accountability when ML systems fail, and haruspices could interpret model behavior.

LLMs are weird. You can sometimes get better results by threatening them, telling them they’re experts, repeating your commands, or lying to them that they’ll receive a financial bonus. Their performance degrades over longer inputs, and tokens that were helpful in one task can contaminate another, so good LLM users think a lot about limiting the context that’s fed to the model.

I imagine that there will probably be people (in all kinds of work!) who specialize in knowing how to feed LLMs the kind of inputs that lead to good results. Some people in software seem to be headed this way: becoming LLM incanters who speak to Claude, instead of programmers who work directly with code.

The unpredictable nature of LLM output requires quality control. For example, lawyers keep getting in trouble because they submit AI confabulations in court. If they want to keep using LLMs, law firms are going to need some kind of process engineers who help them catch LLM errors. You can imagine a process where the people who write a court document deliberately insert subtle (but easily correctable) errors, and delete things which should have been present. These introduced errors are registered for later use. The document is then passed to an editor who reviews it carefully without knowing what errors were introduced. The document can only leave the firm once all the intentional errors (and hopefully accidental ones) are caught. I imagine provenance-tracking software, integration with LexisNexis and document workflow systems, and so on to support this kind of quality-control workflow.

These process engineers would help build and tune that quality-control process: training people, identifying where extra review is needed, adjusting the level of automated support, measuring whether the whole process is better than doing the work by hand, and so on.

A closely related role might be statistical engineers: people who attempt to measure, model, and control variability in ML systems directly. For instance, a statistical engineer could figure out that the choice an LLM makes when presented with a list of options is influenced by the order in which those options were presented, and develop ways to compensate. I suspect this might look something like psychometrics—a field in which psychologists have gone to great lengths to statistically model and measure the messy behavior of humans via indirect means.

Since LLMs are chaotic systems, this work will be complex and challenging: models will not simply be “95% accurate”. Instead, an ML optimizer for database queries might perform well on English text, but pathologically on timeseries data. A healthcare LLM might be highly accurate for queries in English, but perform abominably when those same questions are presented in Spanish. This will require deep, domain-specific work.

As slop takes over the Internet, labs may struggle to obtain high-quality corpuses for training models. Trainers must also contend with false sources: Almira Osmanovic Thunström demonstrated that just a handful of obviously fake articles could cause Gemini, ChatGPT, and Copilot to inform users about an imaginary disease with a ridiculous name. There are financial, cultural, and political incentives to influence what LLMs say; it seems safe to assume future corpuses will be increasingly tainted by misinformation.

One solution is to use the informational equivalent of low-background steel: uncontaminated works produced prior to 2023 are more likely to be accurate. Another option is to employ human experts as model trainers. OpenAI could hire, say, postdocs in the Carolingian Renaissance to teach their models all about Alcuin. These subject-matter experts would write documents for the initial training pass, develop benchmarks for evaluation, and check the model’s responses during conditioning. LLMs are also prone to making subtle errors that look correct. Perhaps fixing that problem involves hiring very smart people to carefully read lots of LLM output and catch where it made mistakes.

In another case of “I wrote this years ago, and now it’s common knowledge”, a friend introduced me to this piece on Mercor, Scale AI, et al., which employ vast numbers of professionals to train models to do mysterious tasks—presumably putting themselves out of work in the process. “It is, as one industry veteran put it, the largest harvesting of human expertise ever attempted.” Of course there’s bossware, and shrinking pay, and absurd hours, and no union.

You would think that CEOs and board members might be afraid that their own jobs could be taken over by LLMs, but this doesn’t seem to have stopped them from using “AI” as an excuse to fire lots of people. I think a part of the reason is that these roles are not just about sending emails and looking at graphs, but also about dangling a warm body over the maws of the legal system and public opinion. You can fine an LLM-using corporation, but only humans can apologize or go to jail. Humans can be motivated by consequences and provide social redress in a way that LLMs can’t.

I am thinking of the aftermath of the Chicago Sun-Times’ sloppy summer insert. Anyone who read it should have realized it was nonsense, but Chicago Public Media CEO Melissa Bell explained that they sourced the article from King Features, which is owned by Hearst, who presumably should have delivered articles which were not composed entirely of sawdust and lies. King Features, in turn, says they subcontracted the entire 64-page insert to freelancer Marco Buscaglia. Of course Buscaglia was most proximate to the LLM and bears significant responsibility, but at the same time, the people who trained the LLM contributed to this tomfoolery, as did the editors at King Features and the Sun-Times, and indirectly, their respective managers. What were the names of those people, and why didn’t they apologize as Buscaglia and Bell did?

I think we will see some people employed (though perhaps not explicitly) as meat shields: people who are accountable for ML systems under their supervision. The accountability may be purely internal, as when Meta hires human beings to review the decisions of automated moderation systems. It may be external, as when lawyers are penalized for submitting LLM lies to the court. It may involve formalized responsibility, like a Data Protection Officer. It may be convenient for a company to have third-party subcontractors, like Buscaglia, who can be thrown under the bus when the system as a whole misbehaves. Perhaps drivers whose mostly-automated cars crash will be held responsible in the same way.

Having written this, I am suddenly seized with a vision of a congressional hearing interviewing a Large Language Model. “You’re absolutely right, Senator. I did embezzle those sixty-five million dollars. Here’s the breakdown…”

When models go wrong, we will want to know why. What led the drone to abandon its intended target and detonate in a field hospital? Why is the healthcare model less likely to accurately diagnose Black people? How culpable should the automated taxi company be when one of its vehicles runs over a child? Why does the social media company’s automated moderation system keep flagging screenshots of Donkey Kong as nudity?

These tasks could fall to a haruspex: a person responsible for sifting through a model’s inputs, outputs, and internal states, trying to synthesize an account for its behavior. Some of this work will be deep investigations into a single case, and other situations will demand broader statistical analysis. Haruspices might be deployed internally by ML companies, by their users, independent journalists, courts, and agencies like the NTSB.

联系我们 contact @ memedata.com