忘记 ChatGPT：为什么研究人员现在在笔记本电脑上运行小型人工智能

Histo.fyi 是一个包含与主要组织相容性复合体 (MHC) 蛋白相关的数据、图像和氨基酸序列的网站。英国牛津的研究员 Chris Thorpe 利用称为大语言模型 (LLM) 的人工智能 (AI) 工具将这些信息压缩为可理解的格式。与 ChatGPT 等流行平台不同，Thorpe 的工具在他的笔记本电脑上本地运行，而不是在线托管。最近，人们对利用紧凑的“开放重量”版本的法学硕士越来越感兴趣，拥有合适计算机资源的个人可以下载这些版本进行本地操作。人工智能技术的这些发展使研究人员能够保护患者机密、降低成本并保持可重复性。微软公司推出了多种紧凑模型，例如Phi-1、Phi-1.5、Phi-2以及Phi-3和Phi-3.5的多个迭代。与早期较大的型号相比，这些型号在安装到便携式设备上时表现出令人印象深刻的性能。这些模型效率的提高使科学家能够直接在个人笔记本电脑或智能手机上使用人工智能辅助。这些进步有望在各个领域取得重大进展，使科学家能够创建适合其独特需求的专门应用程序，并增强本地人工智能能力，以执行在互联网连接有限的偏远地区执行的任务。其中一个例子是新罕布什尔州的一位生物医学科学家创建了 TurbCat-72b，它是 Qwen 模型的一种变体，他利用科学数据专门针对他们的需求改进了模型。这个本地应用程序为集思广益、编辑手稿、原型代码和浓缩以前发布的材料提供支持。此外，这些进步还提供了更好的隐私保护，因为敏感的个人或公司数据不需要发送到外部进行分析。随着这些工具的不断发展，科学家们可以期待在不久的将来能够轻松获得强大的人工智能辅助技术。

Histo.fyi is a website containing data, images, and amino acid sequences related to major histocompatibility complex (MHC) proteins. Researcher Chris Thorpe, based in Oxford, UK, utilizes artificial intelligence (AI) tools known as large language models (LLMs) to condense this information into comprehensible formats. Unlike popular platforms like ChatGPT, Thorpe's tool operates locally on his laptop rather than being hosted online. Recently, there has been growing interest in utilizing compact, 'open weight' versions of LLMs that can be downloaded for local operation by individuals possessing suitable computer resources. These developments in AI technology enable researchers to preserve patient confidentiality, lower costs, and maintain reproducibility. Microsoft Corporation has introduced several compact models, such as Phi-1, Phi-1.5, Phi-2, and multiple iterations of Phi-3 and Phi-3.5. These models demonstrate impressive performance compared to earlier, larger models while fitting onto portable devices. The increasing efficiency of these models allows scientists to employ AI assistance directly on personal laptops or smartphones. These advancements are poised to make significant strides in various fields, enabling scientists to create specialized applications tailored to their unique requirements, as well as enhancing local AI capabilities for tasks performed in remote areas with limited internet connections. One example of this is the creation of TurbCat-72b, a variant of the Qwen model, by a biomedical scientist in New Hampshire who utilized scientific data to refine the model specifically for their needs. This local application provides support in brainstorming ideas, editing manuscripts, prototyping code, and condensing previously published material. Additionally, these advancements offer improved privacy protection, as sensitive personal or corporate data need not be sent externally for analysis. With the continuous development of these tools, scientists can look forward to having powerful AI assistant technologies readily accessible to them in the near future.

The website histo.fyi is a database of structures of immune-system proteins called major histocompatibility complex (MHC) molecules. It includes images, data tables and amino-acid sequences, and is run by bioinformatician Chris Thorpe, who uses artificial intelligence (AI) tools called large language models (LLMs) to convert those assets into readable summaries. But he doesn’t use ChatGPT, or any other web-based LLM. Instead, Thorpe runs the AI on his laptop.

Chatbots in science: What can ChatGPT do for you?

Over the past couple of years, chatbots based on LLMs have won praise for their ability to write poetry or engage in conversations. Some LLMs have hundreds of billions of parameters — the more parameters, the greater the complexity — and can be accessed only online. But two more recent trends have blossomed. First, organizations are making ‘open weights’ versions of LLMs, in which the weights and biases used to train a model are publicly available, so that users can download and run them locally, if they have the computing power. Second, technology firms are making scaled-down versions that can be run on consumer hardware — and that rival the performance of older, larger models.

Researchers might use such tools to save money, protect the confidentiality of patients or corporations, or ensure reproducibility. Thorpe, who’s based in Oxford, UK, and works at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, is just one of many researchers exploring what the tools can do. That trend is likely to grow, Thorpe says. As computers get faster and models become more efficient, people will increasingly have AIs running on their laptops or mobile devices for all but the most intensive needs. Scientists will finally have AI assistants at their fingertips — but the actual algorithms, not just remote access to them.

Big things in small packages

Several large tech firms and research institutes have released small and open-weights models over the past few years, including Google DeepMind in London; Meta in Menlo Park, California; and the Allen Institute for Artificial Intelligence in Seattle, Washington (see ‘Some small open-weights models’). (‘Small’ is relative — these models can contain some 30 billion parameters, which is large by comparison with earlier models.)

Although the California tech firm OpenAI hasn’t open-weighted its current GPT models, its partner Microsoft in Redmond, Washington, has been on a spree, releasing the small language models Phi-1, Phi-1.5 and Phi-2 in 2023, then four versions of Phi-3 and three versions of Phi-3.5 this year. The Phi-3 and Phi-3.5 models have between 3.8 billion and 14 billion active parameters, and two models (Phi-3-vision and Phi-3.5-vision) handle images¹. By some benchmarks, even the smallest Phi model outperforms OpenAI’s GPT-3.5 Turbo from 2023, rumoured to have 20 billion parameters.

Sébastien Bubeck, Microsoft’s vice-president for generative AI, attributes Phi-3’s performance to its training data set. LLMs initially train by predicting the next ‘token’ (iota of text) in long text strings. To predict the name of the killer at the end of a murder mystery, for instance, an AI needs to ‘understand’ everything that came before, but such consequential predictions are rare in most text. To get around this problem, Microsoft used LLMs to write millions of short stories and textbooks in which one thing builds on another. The result of training on this text, Bubeck says, is a model that fits on a mobile phone but has the power of the initial 2022 version of ChatGPT. “If you are able to craft a data set that is very rich in those reasoning tokens, then the signal will be much richer,” he says.

ChatGPT for science: how to talk to your data

Phi-3 can also help with routing — deciding whether a query should go to a larger model. “That’s a place where Phi-3 is going to shine,” Bubeck says. Small models can also help scientists in remote regions that have little cloud connectivity. “Here in the Pacific Northwest, we have amazing places to hike, and sometimes I just don’t have network,” he says. “And maybe I want to take a picture of some flower and ask my AI some information about it.”

Researchers can build on these tools to create custom applications. The Chinese e-commerce site Alibaba, for instance, has built models called Qwen with 500 million to 72 billion parameters. A biomedical scientist in New Hampshire fine-tuned the largest Qwen model using scientific data to create Turbcat-72b, which is available on the model-sharing site Hugging Face. (The researcher goes only by the name Kal’tsit on the Discord messaging platform, because AI-assisted work in science is still controversial.) Kal’tsit says she created the model to help researchers to brainstorm, proof manuscripts, prototype code and summarize published papers; the model has been downloaded thousands of times.

Preserving privacy

Beyond the ability to fine-tune open models for focused applications, Kal’tsit says, another advantage of local models is privacy. Sending personally identifiable data to a commercial service could run foul of data-protection regulations. “If an audit were to happen and you show them you’re using ChatGPT, the situation could become pretty nasty,” she says.

Cyril Zakka, a physician who leads the health team at Hugging Face, uses local models to generate training data for other models (which are sometimes local, too). In one project, he uses them to extract diagnoses from medical reports so that another model can learn to predict those diagnoses on the basis of echocardiograms, which are used to monitor heart disease. In another, he uses the models to generate questions and answers from medical textbooks to test other models. “We are paving the way towards fully autonomous surgery,” he explains. A robot trained to answer questions would be able to communicate better with doctors.

Zakka uses local models — he prefers Mistral 7B, released by the tech firm Mistral AI in Paris, or Meta’s Llama-3 70B — because they’re cheaper than subscription services such as ChatGPT Plus, and because he can fine-tune them. But privacy is also key, because he’s not allowed to send patients’ medical records to commercial AI services.

Inside the maths that drives AI

Johnson Thomas, an endocrinologist at the health system Mercy in Springfield, Missouri, is likewise motivated by patient privacy. Clinicians rarely have time to transcribe and summarize patient interviews, but most commercial services that use AI to do so are either too expensive or not approved to handle private medical data. So, Thomas is developing an alternative. Based on Whisper — an open-weight speech-recognition model from OpenAI — and on Gemma 2 from Google DeepMind, the system will allow physicians to transcribe conversations and convert them to medical notes, and also summarize data from medical-research participants.

Privacy is also a consideration in industry. CELLama, developed at the South Korean pharmaceutical company Portrai in Seoul, exploits local LLMs such as Llama 3.1 to reduce information about a cell’s gene expression and other characteristics to a summary sentence². It then creates a numerical representation of this sentence, which can be used to cluster cells into types. The developers highlight privacy as one advantage on their GitHub page, noting that CELLama “operates locally, ensuring no data leaks”.

Putting models to good use

As the LLM landscape evolves, scientists face a fast-changing menu of options. “I’m still at the tinkering, playing stage of using LLMs locally,” Thorpe says. He tried ChatGPT, but felt it was expensive, and the tone of its output wasn’t right. Now he uses Llama locally, with either 8 billion or 70 billion parameters, both of which can run on his Mac laptop.

Another benefit, Thorpe says, is that local models don’t change. Commercial developers, by contrast, can update their models at any moment, leading to different outputs and forcing Thorpe to alter his prompts or templates. “In most of science, you want things that are reproducible,” he explains. “And it’s always a worry if you’re not in control of the reproducibility of what you’re generating.”

For another project, Thorpe is writing code that aligns MHC molecules on the basis of their 3D structure. To develop and test his algorithms, he needs lots of diverse proteins — more than exist naturally. To design plausible new proteins, he uses ProtGPT2, an open-weights model with 738 million parameters that was trained on about 50 million sequences³.

Sometimes, however, a local app won’t do. For coding, Thorpe uses the cloud-based GitHub Copilot as a partner. “It kind of feels like my arm’s chopped off when for some reason I can’t actually use Copilot,” he says. Local LLM-based coding tools do exist (such as Google DeepMind’s CodeGemma and one from California-based developers Continue), but in his experience they can’t compete with Copilot.

Access points

So, how do you run a local LLM? Software called Ollama (available for Mac, Windows and Linux operating systems) lets users download open models, including Llama 3.1, Phi-3, Mistral and Gemma 2, and access them through a command line. Other options include the cross-platform app GPT4All and Llamafile, which can transform LLMs into a single file that runs on any of six operating systems, with or without a graphics processing unit.

NatureTech hub

Sharon Machlis, a former editor at the website InfoWorld, who lives in Framingham, Massachusetts, wrote a guide to using LLMs locally, covering a dozen options. “The first thing I would suggest,” she says, “is to have the software you choose fit your level of how much you want to fiddle.” Some people prefer the ease of apps, whereas others prefer the flexibility of the command line.

Whichever approach you choose, local LLMs should soon be good enough for most applications, says Stephen Hood, who heads open-source AI at the tech firm Mozilla in San Francisco. “The rate of progress on those over the past year has been astounding,” he says.

As for what those applications might be, that’s for users to decide. “Don’t be afraid to get your hands dirty,” Zakka says. “You might be pleasantly surprised by the results.”