（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=43102528

生物医学科学领域的几位研究人员对AI将通过产生出色的假设显着加速生物医学发现的说法表示怀疑。 AI可以更好地整合各种领域并解决缺乏“才华横溢的创意研究人员”的论点是有缺陷的。生物学家已经有很多好主意。瓶颈在于测试这些假设的严格且耗时的过程。 AI目前的状态可用于文献综述和摘要，但对科学设计和假设的产生无用。计算机科学可能受到假设产生的限制，但相反的生物学是正确的。限速步骤是测试，通过AI生成的假设进行边际改进可能无法证明与综合，测试和临床试验相关的巨大成本和时间合理，尤其是当现有知识和工具可以取得相似的结果时。

只有一个问题：人工智能不智能，这是一种系统性风险 2024-08-09

（评论） 2025-02-25

（评论） 2024-09-05

人工智能到底是什么？数字幻象、虚假承诺和大规模再教育 2024-08-11

原文

I'm not sure if people here even read the entirety of the article. From the article:

> We applied the AI co-scientist to assist with the prediction of drug repurposing opportunities and, with our partners, validated predictions through computational biology, expert clinician feedback, and in vitro experiments.

> Notably, the AI co-scientist proposed novel repurposing candidates for acute myeloid leukemia (AML). Subsequent experiments validated these proposals, confirming that the suggested drugs inhibit tumor viability at clinically relevant concentrations in multiple AML cell lines.

and,

> For this test, expert researchers instructed the AI co-scientist to explore a topic that had already been subject to novel discovery in their group, but had not yet been revealed in the public domain, namely, to explain how capsid-forming phage-inducible chromosomal islands (cf-PICIs) exist across multiple bacterial species. The AI co-scientist system independently proposed that cf-PICIs interact with diverse phage tails to expand their host range. This in silico discovery, which had been experimentally validated in the original novel laboratory experiments performed prior to use of the AI co-scientist system, are described in co-timed manuscripts (1, 2) with our collaborators at the Fleming Initiative and Imperial College London. This illustrates the value of the AI co-scientist system as an assistive technology, as it was able to leverage decades of research comprising all prior open access literature on this topic.

The model was able to come up with new scientific hypotheses that were tested to be correct in the lab, which is quite significant.

So, I've been reading Google research papers for decades now and also worked there for a decade and wrote a few papers of my own.

When google publishes papers, they tend to juice the results significance (google is not the only group that does this, but they are pretty egregious). You need to be skilled in the field of the paper to be able to pare away the exceptional claims. A really good example is https://spectrum.ieee.org/chip-design-controversy while I think Google did some interesting work there and it's true they included some of the results in their chip designs, their comparison claims are definitely over-hyped and they did not react well when they got called out on it.

It's an ongoing debacle with multiple people making extremely good arguments that Google overstated the results.

Yes, I know it's in TPUs and I said exactly that.

You simply can't take Google press at face value.

Yes, I am aware. I didn't find Jeff's argument particularly convincing. Please note: I've worked personally with Jeff before and shared many a coffee with him. He's done great work and messed up a lot of things, too.

Seems to be true. 'Published' scientific research, by its sheer social-dynamics (verging on highly toxic), is the academic equivalent of a pouty-girl vis-a-vis Instagram.

(academic-burnout resembles creator-burnout for similar reasons)

I have worked with Google teams as well, and they taught me a fair bit about how to be rigorously skeptical. It takes domain knowledge, statistical knowledge, data, time and the computational resources to challenge them. I've done it, but it took real resources.

That said, it's a useful exercise to figure out the plan of attack. My experience is the "juice" was mainly in "easy true negative" subclasses. They weren't oversampled, but the human brain wouldn't even consider most of that data. Once you ablate those subclasses from the dataset, (which takes a lot of additional labelling effort), you can start challenging their assertions. But it's hard.

And that said I also review a number of articles in that domain, and I haven't seen a group with stronger datasets overall.

Remember Google is a publicly traded company, so everything must be reviewed to "ensure shareholder value". Like dekhn said, its impressive, but marketing wants more than "impressive".

This is true for public universities and private universities; you see the same thing happening in academic papers (and especially the university PR around the paper)

The actual papers don't overhype. But the university PR's regarding those papers? They can really overhype the results. And of course, the media then takes it up an extra order of magnitude.

I've definitely seen many examples of papers where the conclusions went far beyond what the actual results warranted. Scientists are incentivized to claim their discovery generalizes as much as possible.

But yes, it's normally: "science paper says an experiment in mice shows promising results in cancer treatment" then "University PR says a new treatment for cancer is around the corner" and "Media says cure for all cancer"

Depends on what you call "overhype".

Wishful mnemonics in the field was called out by Drew McDermott in the mid 1970's and it is still a problem today.

https://www.inf.ed.ac.uk/teaching/courses/irm/mcdermott.pdf

And:

> As a field, I believe that we tend to suffer from what might be called serial silver bulletism, defined as follows: the tendency to believe in a silver bullet for AI, coupled with the belief that previous beliefs about silver bullets were hopelessly naive.

(H. J. Levesque. On our best behaviour. Artificial Intelligence, 212:27–35, 2014.)

That applies to absolutely everyone. Convenient results are highlighted, inconvenient are either not mentioned or de-emphasized. You do have to be well read in the field to see what the authors _aren't_ saying, that's one of the purposes of being well-read in the first place. That is also why 100% of science reporting is basically disinformation - journalists are not equipped with this level of nuanced understanding.

yes, but google has a long history of being egregious, with the additional detail that their work is often irreproducible for technical reasons (rather than being irreproducible for missing methods). For example, we published an excellent paper but nobody could reproduce it because at the time, nobody else had a million spare cores to run MD simulations of proteins.

It's hardly Google's problem that nobody else has a million cores, wouldn't you agree? Should they not publish the result at all if it's using more than a handful of cores so that anyone in academia can reproduce it? That'd be rather limiting.

Well, a goal of most science is to be reproducible, and it couldn't be reproduced, merely for technical reasons (and so we shared as much data from the runs as possible so people could verify our results). This sort of thing comes up when CERN is the only place that can run an experiment and nobody can verify it.

Actually it IS google's problem. They don't publish through traditional academic venues unless it suits them (much like OpenAI/Anthropic, often snubbing places like NeurIPS due to not wanting to MIT open source their code/models which peer reviewers demand) and them demanding so many GPUs chokes supply for the rest of the field - a field which they rely on the free labor of to make complimentary technologies to their models.

> Google's problem that nobody else has a million cores, wouldn't you agree

On the contrary - their advantage. They know it and they can make outlandish claims that no one will disprove

> That applies to absolutely everyone.

Eating, drinking, sleeping apply to absolutely everyone. Deception varies greatly by person and situation. I know people who are painfully honest and people I don't trust on anything, and many in between.

That a UPR inhibitor would inhibit viability of AML cell lines is not exactly a novel scientific hypothesis. They took a previously published inhibitor known to be active in other cell lines and tried it in a new one. It's a cool, undergrad-level experiment. I would be impressed if a sophomore in high school proposed it, but not a sophomore in college.

> I would be impressed if a sophomore in high school proposed it

That sounds good enough for a start, considering you can massively parallelize the AI co-scientist workflow, compared to the timescale and physical scale it would take to do the same thing with human high school sophomores.

And every now and then, you get something exciting and really beneficial coming from even inexperienced people, so if you can increase the frequency of that, that sounds good too.

We don't need an army of high school sophomores, unless they are in the lab pipetting. The expensive part of drug discovery is not the ideation phase, it is the time and labor spent running experiments and synthesizing analogues.

As discussed elsewhere, Deepmind are also working on extending Alphafold to simulate biochemical pathways and then looking to tackle whole-cell simulation. It's not quite pipetting, but this sort of AI scientist would likely be paired with the simulation environment (essentially as function calling), to allow for very rapid iteration of in-silico research.

It sounds like you're suggesting that we need machines that mass produce things like automated pipetting machines and the robots that glue those sorts of machines together.

Replacing a skilled technician is remarkably challenging. Often times, when you automate this, you just end up wasting a ton of resources rather than accelerating discovery. Often, simply integrating devices from several vendors (or even one vendor) takes months.

I've built microscopes intended to be installed inside workcells similar to what companies like Transcriptic built (https://www.transcriptic.com/). So my scope could be automated by the workcell automation components (robot arms, motors, conveyors, etc).

When I demo'd my scope (which is similar to a 3d printer, using low-cost steppers and other hobbyist-grade components) the CEO gave me feedback which was very educational. They couldn't build a system that used my style of components because a failure due to a component would bring the whole system down and require an expensive service call (along with expensive downtime for the user). Instead, their mech engineer would select extremely high quality components that had a very low probability of failure to minimize service calls and other expensive outages.

Unfortunately, the cost curve for reliability not pretty, to reduce mechanical failures to close to zero costs close to infinity dollars.

One of the reasons Google's book scanning was so scalable was their choice to build fairly simple, cheap, easy to maintain machines, and then build a lot of them, and train the scanning individuals to work with those machines quirks. Just like their clusters, they tolerate a much higher failure rate and build all sorts of engineering solutions where other groups would just buy 1 expensive device with a service contract.

This sounds like it could be centralised, a bit like the clouds in the IT world. A low failure rate of 1-3% is comparable to servers in a rack, but if you have thousands of them, then this is just a statistic and not a servicing issue. Several hyperscalers simply leave failed nodes where they are, it’s not worth the bother to service them!

Maybe the next startup idea is biochemistry as a service, centralised to a large lab facility with hundreds of each device, maintained by a dedicated team of on-site professionals.

None of the companies that proposed this concept have managed to demonstrate strong marketplace viability. A lot of discovery science remains extremely manual, artisinal, and vehemently opposed to automation.

> They couldn't build a system that used my style of components because a failure due to a component would bring the whole system down and require an expensive service call

Could they not make the scope easily replaceable by the user and just supply a couple of spares?

Just thinking of how cars are complex machines but a huge variety of parts could be replaced by someone willing to spend a couple of hours learning how.

yes, and that's the reason I went to work at google: to get access to their distributed systems and use ML to scale up biology. I never was able to join Google Research and do the work I wanted (but DeepMind went ahead and solved protein structure prediction, so, the job got done anyway).

They really didn't solve it. AF works great for proteins that have a homologous protein with a crystal structure. It is absolutely useless for proteins with no published structure to use as a template - e.g. many of the undrugged cancer targets in existence.

@dekhn it is true (I also work in the field. I'm a software engineer who got a wet-lab PhD in biochemistry and work at a biotech doing oncology drug discovery)

I wouldn't say it's an engineering problem. Biology and pharmacology are very complex with lots of curveballs, and each experiment is often different and not done enough to warrant full engineering-scale optimization (although this is sometimes the case!).

We could have an alternative system where VC don’t need to appease regulators but must place X billion in escrow for compensation of any harm the medicine does to customers.

Regulator is not only there to protect the public, it also protects VC from responsibility

> VC don’t need to appease regulators

Regulations around clinical trials represent the floor of what's ethically permissible, not the ceiling. As in, these guidelines represent the absolute bare minimum required when performing drug trials to prevent gross ethical violations. Not sure what corners you think are ripe for cutting there.

> Regulations around clinical trials represent the floor of what's ethically permissible, not the ceiling.

Disagree. The US FDA especially is overcautious to the point of doing more harm than good - they'd rather ban hundreds of lifesaving drugs than allow one thalidomide to slip through.

Yeah that's not how anything works. Compounds are approved for use or not based on empirical evidence, thus the need for clinical trials. What's your level of exposure to the pharma industry?

> Compounds are approved for use or not based on empirical evidence, thus the need for clinical trials.

But off-label use is legal, so it's ok to use a drug that's safe but not proven effective (to the FDA's high standards) for that ailment... but only if it's been proven effective for some other random ailment. That makes no sense.

> What's your level of exposure to the pharma industry?

Just an interested outsider who read e.g. the Omegaven story on https://www.astralcodexten.com/p/adumbrations-of-aducanumab .

I strongly encourage you to take a half hour and have a look at what goes into preclinical testing and the phases of official trials. An understanding of the data gathered during this process should clear up some of your confusion around safety and efficacy of off-label uses, which parenthetically pharma companies are strictly regulated against encouraging in any way.

This is the general problem with nearly all of this era of generative AI and why the public dislike it so much.

It is trained on human prose; human prose is primarily a representation of ideas; it synthesizes ideas.

There are very few uses for a machine to create ideas. We have a wealth of ideas and people enjoy coming up with ideas. It’s a solution built for a problem that does not exist.

Especially when you consider the artificial impressive high school sophomore is capable of having impressive high school sophomore ideas across and between an incredibly broad spectrum of domains.

And that their generation of impressive high school sophomore ideas is faster, more reliable, communicated better, and can continue 24/7 (given matching collaboration), relative to their bio high school sophomore counterparts.

I don’t believe any natural high school sophomore as impressive on those terms, has ever existed. Not close.

We humans (I include myself) are awful at judging things or people accurately (in even a loose sense) across more than one or two dimensions.

This is especially true when the mix of ability across several dimensions is novel.

(I also think people under estimate the degree that we, as users and “commanders” of AI, bottleneck their potential. I don’t suggest they are ready to operate without us. But that our relative lack of energy, persistence & focus all limit what we get from them in those dimensions, hiding significant value.

We famously do this with each other, so not surprising. But worth keeping in mind when judging limits: whose limits are we really seeing.)

I don't need high school level ideas, though. If people do, that's good for them, but I haven't met any. And if the quality of the ideas is going to improve in future years, that's good too, but also not demonstrated here.

> And if the quality of the ideas is going to improve in future years, that's good too, but also not demonstrated here.

I don't quite understand the argument here. The future hasn't happened yet. What does it mean to demonstrate the future developments now?

I am going to argue that you do. Then I will be interested in your response, if you feel inclined.

We all have our idiosyncratically distributed areas of high intuition, expertise and fluency.

None of us need apprentice level help there, except to delegate something routine.

Lower quality ideas there would just gum things up.

And then we all have vast areas of increasingly lesser familiarity.

I find, that the more we grow our strong areas, the more those areas benefit with as efficient contact as possible with as many more other areas as possible. In both trivial and deeper ways.

The better developer I am, in terms of development skill, tool span, novel problem recognition and solution vision, the more often and valuable I find quick AI tutelage on other topics, trivial or non-trivial.

If you know a bright high school student highly familiar with a domain that you are not, but have reason to think that area might be helpful, don’t you think instant access to talk things over with that high schooler would be valuable?

Instant non-trivial answers, perspective and suggestions? With your context and motivations taken into account?

Multiplied by a million bright high school students over a million domains.

—

We can project the capability vector of these models onto one dimension, like “school level idea quality”. But lower dimension projections are literally shadows of the whole.

It if we use them in the direction of their total ability vector (and given they can iterate, it is actually a compounding eigenvector!) and their value goes way beyond “a human high schooler with ideas”.

It does take time to get the most out of a differently calibrated tool.

Suggesting "maybe try this known inhibitor in other cell lines" isn't exactly novel information though. It'd be more impressive and useful if it hadn't had any published information about working as a cancer inhibitor before. People are blasé about it because it's not really beating the allegations that it's just a very fancy parrot when the highlight of it's achievements is to say try this known inhibitor with these other cell lines, decent odds that the future work sections of papers on the drug already suggested trying on other lines too...

A couple years ago even suggesting that a computer could propose anything at all was sci-fi. Today a computer read the whole internet, suggested a place to look at and experiments to perform and… ‘not impressive enough’. Oof.

> Today a computer read the whole internet, suggested a place to look at

Or imagine this one - computer maps the whole world, suggests a route how to get to any destination?!

You just described a basic search engine.

LLM is kind of a search engine for language

Preposterous- cavemen had no language but they could reason, think and learn.

A child learns how to eat solid food and how to walk. That a square peg fits into a square hole. This has nothing to do with language.

people who deaf and mute and cannot read can still reason and solve problems.

People are facing existential dread that the knowledge they worked years for is possibly about to become worth a $20 monthly subscription. People will downplay it for years no matter what.

I'm sure the scientists involved had a wish list of dozens of drug candidates to repurpose to test based on various hypotheses. Ideas are cheap, time is not.

In this case they actually tested a drug probably because Google is paying for them to test whatever the AI came up with.

I’m not familiar with the subject matter, but given your description, I wouldn’t really be impressed by anyone suggesting it. It just sounds like a very plausible “What if” alternative.

On the level of suggesting suitable alternative ingredients in fruit salad.

We should really stop insulting the intelligence of people to sell AI.

I read the cf-PICI paper (abstract) and the hypothesis from the AI co-scientist. While the mechanism from the actual paper is pretty cool (if I'm understanding it correctly), I'm not particularly impressed with the hypothesis from the co-scientist.

It's quite a natural next step to take to consider the tails and binding partners to them, so much so that it's probably what I would have done and I have a background of about 20 minutes in this particular area. If the co-scientist had hypothesised the novel mechanism to start with, then I would be impressed at the intelligence of it. I would bet that there were enough hints towards these next steps in the discussion sections of the referenced papers anyway.

What's a bit suspicious is in the Supplementary Information, around where the hypothesis is laid out, it says "In addition, our own preliminary data indicate that cf-PICI capsids can indeed interact with tails from multiple phage types, providing further impetus for this research direction." (Page 35). A bit weird that it uses "our own preliminary data".

> A bit weird that it uses "our own preliminary data"

I think potential of LLM based analysis is sky high given the amount of concurrent research happening and high context load required to understand the papers. However there is a lot of pressure to show how amazing AI is and we should be vigilant. So, my first thought was - could it be that training data / context / RAG having access to a file it should not have contaminated the result? This is indirect evidence that maybe something was leaked.

This is one thing I've been wondering about AI: will its broad training enable it to uncover previously covered connections between areas the way multi-disciplinary people tend to, or will it still miss them because it's still limited to its training corpus and can't really infer.

If it ends up being more the case that AI can help us discover new stuff, that's very optimistic.

In some sense, AI should be the most capable at doing this within math. Literally the entire domain in its entirety can be tokenized. There are no experiments required to verify anything, just theorem-lemma-proof ad nauseam.

Doing this like in this test, it's very tricky to rule out the hypothesis that the AI is just combining statements from the Discussion / Future Outlook sections of some previous work in the field.

Math seems to me like the hardest thing for LLMs to do. It requires going deep with high IQ symbol manipulation. The case for LLMs is currently where new discoveries can be made from interpolation or perhaps extrapolation between existing data points in a broad corpus which is challenging for humans to absorb.

Alternatively, human brains are just terrible at "high IQ symbol manipulation" and that's a much easier cognitive task to automate than, say, "surviving as a stray cat".

If they solve tokenization, you'll be SHOCKED at how much it was holding back model capabilities. There's tons of works at NeurIPS about various tokenizer hacks or alternatives to bpe which massively improve various types of math that models are bad at (i.e. arithmatic performance)

I reject the Stochastic Parrot theory. The claim is more about comparative advantage; AI systems already exist that are superhuman on breadth of knowledge at undergrad understanding depth. So new science should be discoverable in fields where human knowledge breadth is the limiting factor.

> AI systems already exist that are superhuman on breadth of knowledge at undergrad understanding depth

Two problems with this:

1. AI systems hallucinate stuff. If it comes up with some statement, how will you know that it did not just hallucinate it?

2. Human researchers don't work just on their own knowledge, they can use a wide range of search engines. Do we have any examples of AI systems like these that produce results that a third-year grad student couldn't do with Google Scholar and similar instructions? Tests like in TFA should always be compared to that as a baseline.

> new science should be discoverable in fields where human knowledge breadth is the limiting factor

What are these fields? Can you give one example? And what do you mean by "new science"?

The way I see it, at best the AI could come up with a hypothesis that human researchers could subsequently test. Again, you risk that the hypothesis is hallucination and you waste a lot of time and money. And again, researchers can google shit and put facts together from different fields than their own. Why would the AI be able to find stuff the researchers can't find?

This is kinda getting at a core question of epistemology. I’ve been working on an epistemological engine by which LLMs would interact with a large knowledge graph and be able to identify “gaps” or infer new discoveries. Crucial to this workflow is a method for feedback of real world data. The engine could produce endless hypotheses but they’re just noise without some real world validation metric.

Similar stuff is being done for material sciences where AI suggest different combinations to find different properties. So when people say AI(machine learning, LLM) are just for show I am a bit shocked as AI's today have accelerated discoveries in many different fields of science and this is just the start. Anna archive probably will play a huge role in this as no human or even a group of humans will have all the knowledge of so many fields that an Ai will have.

https://www.independent.co.uk/news/science/super-diamond-b26...

It's a matter of perspective and expectations.

The automobile was a useful invention. I don't know if back then there was a lot of hype around how it can do anything a horse can do, but better. People might have complained about how it can't come to you when called, can't traverse stairs, or whatever.

It could do _one_ thing a horse could do better: Pull stuff on a straight surface. Doing just one thing better is evidently valuable.

I think AI is valuable from that perspective, you provide a good example there. I might well be disappointed if I would expect it to be better than humans at anything humans can do. It doesn't have to. But with wording like "co-scientist", I see where that comes from.

> It could do _one_ thing a horse could do better: Pull stuff on a straight surface

I would say the doubters were right, and the results are terrible.

We redesigned the world to suit the car, instead of fixing its shortcomings.

Navigating a car centric neighbourhood on foot is anywhere between depressing and dangerous.

I hope the same does not happen with AI. But I expect it will. Maybe in your daily life AI will create legal contracts there are thousands of pages long And you will need AI of your own to summarise them and process them.

Excellent point. Just because the invention of the automobile arguably introduced something valuable, how we ended up using them had a ton of negative side effects. I don't know enough about cars or horses to argue pros and cons. But I can certainly see how we _could_ have used them in a way that's just objectively better than what we could do without them. But you're right, I can't argue we did.

It's not just about doing something better but about the balance between the pros and the cons. The problem with LLMs are hallucinations. If cars just somehow made you drive the wrong way with the frequency that LLMs send one down the wrong path with compelling sounding nonsense, then I suspect we'd still be riding horses nowadays.

I can get value out of them just fine. But I don't use LLMs to find answers, mostly to find questions. It's not really what they're being sold/hyped for, of course. But that's kinda my point.

It's cool, no doubt. But keep in mind this is 20 years late:

  As a prototype for a "robot scientist", Adam is able to perform independent
  experiments to test hypotheses and interpret findings without human guidance,
  removing some of the drudgery of laboratory experimentation.[11][12] Adam is
  capable of:
  
      * hypothesizing to explain observations
      * devising experiments to test these hypotheses
      * physically running the experiments using laboratory robotics
      * interpreting the results from the experiments
      * repeating the cycle as required[10][13][14][15][16]
  
  While researching yeast-based functional genomics, Adam became the first
  machine in history to have discovered new scientific knowledge independently of
  its human creators.[5][17][18]

https://en.wikipedia.org/wiki/Robot_Scientist

I also think people underestimate how much benefit a current LLM already has to researchers.

A lot of them have to do things on computers which has nothing to do with their expertise. Like coding a small tool for working their data, small tools crunching results, formatting text data, searching and finding the right materials.

A LLM which helps a scientist to code something in an hour instead of a week, makes this research A LOT faster.

And we know from another paper, that we have now so much data, you need to use systems to find the right information for you. The study estimated how much additionanl critical information a research paper missed.

Don't worry, it takes about 10 years for drugs to get approved, AIs will be superintelligent long before the government gives you permission to buy a dose of AI-developed drugs.

Not that I don't think there's a lot of potential in this approach, but the leukemia example seemed at least poorly-worded, "the suggested drugs inhibit tumor viability" reads oddly given that blood cancers don't form tumors?

“Drug repurposing for AML” lol

As a person who is literally doing his PhD on AML by implementing molecular subtyping, and ex-vivo drug predictions. I find this super random.

I would truly suggest our pipeline instead of random drug repurposing :)

https://celvox.co/solutions/seAMLess

edit: Btw we’re looking for ways to fund/commercialize our pipeline. You could contact us through the site if you’re interested!

Can you explain what you mean by subtyping and if/how it negates the usefulness of repurposing (if that’s what you meant to say). Wouldn’t subtyping complement a drug repurposing screen by allowing the scientist to test compounds against a subset of a disease?

And drug repurposing is also used for conditions with no known molecular basis like autism. You’re not suggesting its usefulness is limited in those cases right?

Sure. There are studies like BEAT-AML which tests selected drugs’ responses on primary AML material. So, not on a cell-line but on true patient data. Combining this information with molecular measurements, you can actually say something about which drugs would be useful for a subset of the patients.

However, this is still not how you treat a patient. There are standard practices in the clinic. Usually the first line treatment is induction chemo with hypomethylating agents (except elderly who might not be eligible for such a treatment). Otherwise the options are still very limited, the “best” drug in the field so far is a drug called Venetoclax, but more things are coming up such as immuno-therapy etc. It’s a very complex domain, so drug repurposing on an AML cell line is not a wow moment for me.

It bothers me that the word 'hallucinate' is used to describe when the output of a machine learning model is wrong.

In other fields, when models are wrong, the discussion is around 'errors'. How large the errors are, their structural nature, possible bounds, and so forth. But when it's AI it's a 'hallucination'. Almost as if the thing is feeling a bit poorly and just needs to rest and take some fever-reducer before being correct again.

It bothers me. Probably more than it should, but it does.

I think hallucinate is a good term because when an AI completely makes up facts or APIs etc it doesn't do so as a minor mistake of an otherwise correct reasoning step.

This search is random in the same way that AlphaGo's move selection was random.

In the Monte Carlo Tree Search part, the outcome distribution on leaves is informed by a neural network trained on data instead of a so-called playout. Sure, part of the algorithm does invoke a random() function, but by no means the result is akin to the flip of a coin.

There is indeed randomness in the process, but making it sound like a random walk is doing a disservice to nuance.

I feel many people are too ready to dismiss the results of LLMs as "random", and I'm afraid there is some element of seeing what one wants to see (i.e. believing LLMs are toys, because if they are not, we will lose our jobs).

You're right about the random search however the domains that the model is doing the search is quite different. In AlphaGo, you do MCTS in all possible moves in GO, therefore it is a domain specific search. Here, you're doing the search in language whereas you would like to do the search possibly on genetics or molecular data (RNA-seq, ATAC-seq etc.). For instance, yesterday Arcinstitute published Evo2, where you can actually check a given mutation would be pathogenic or not. So, starting from genetics data (among thousands of variants) you might be able to say this variant might be pathogenic for the patient given its high variant allele frequency.

On top of that you are looking at the results in cell-lines which might not reflect the true nature of what would happen in-vivo (a mouse model or a human).

So, there is domain specific knowledge, which one would like to take into account for decision-making. For me, I would trust a Molecular Tumor Board with hematologists, clinicians - and possibly computational biologists :) - over a language random tree search for treating my acute myeloid leukemia, but this is a personal choice.

Tbh I don’t see why I would use this. I don’t need an ai to connect across ideas or come up with new hypothesis. I need it to write lots of data pipeline code to take data that is organized by project, each in a unique way, each with its own set of multimodal data plus metadata all stored in long form documents with no regular formatting, and normalize it all into a giant database. I need it to write and test a data pipeline to detect events both in amplitude space and frequency space in acoustic data. I need it to test out front ends for these data analysis backends so i can play with the data. Like I think this is domain specific. Probably drug discovery requires testing tons of variables one by one iterating through the values available. But that’s not true for my research. But not everything is for everybody and that’s okay.

Exactly, they want to automate the most rewarding part that we don’t need help with… plus I don’t believe they’ve solved the problem of LLMs generating trite ideas.

This ludditism shit is the death drive externalized.

You'd forsake an amazing future based on copes like the precautionary principle or worse yet, a belief that work is good and people must be forced into it.

The tears of butthurt scientists, or artists who are automated out of existence because they refused to leverage or use AI systems to enhance themselves will be delicious.

The only reason that these companies aren't infinitely better than what Aaron Swartz tried to do was that they haven't open accessed everything. Deepseek is pretty close (sans the exact dataset), and so is Mistral and apparently Meta?

Y'all talked real big about loving "actual" communism until it came for your intellectual property, now you all act like copyright trolls. Fuck that!

You sure are argumentative for someone who believes they are so correct.

In any case, I don’t think I’m a Luddite. I use many ai tools in my research including for idea generation. So far i have not found it to be very useful. Moreover the things it could be useful for such as automated data pipeline generation it doesn’t do. I could imagine a series of agents where one designs pipelines and one fills in the codes, etc per node in the pipeline but so far I didn’t see anything like that. If you have some kind of constructive recommendations in that direction I’m happy to hear them.

I think you're just not the target audience. If AI can come up with some good ideas and then split it into tasks some of them an undergrad can do - it can speed up the global research speed by involving more people in useful science

In science, having ideas is not the limiting factor. They're just automating the wrong thing. I want to have ideas and ask the machine to test for me, not the other way around.

If I understand what's been published about this, it isn't just ideation, but also critiquing and ranking them, to select the few most worth pursuing.

Choosing a hypothesis to test is actually a hard problem, and one that a lot of humans do poorly, with significant impact on their subsequent career. From what I have seen as an outsider to academia, many of the people who choose good hypotheses for their dissertation describe it as having been lucky.

I bet all of these researchers involved had a long list of candidates they'd like to test and have a very good idea what the lowest hanging fruit are, sometimes for more interesting reasons than 'it was used successfully as an inhibitor for X and hasn't been tried yet in this context' — not that that isn't a perfectly good reason. I don't think ideas are the limiting factor. The reason attention was paid to this particular candidate is because google put money down.

The difference is the complexity of ideas. There are straightforward ideas anyone can test and improve, and there are ideas where only PhDs in CERN can test

I don't think that's really right. E.g. what makes finding the Higgs boson difficult is that you need to build a really large collider, not coming up with the idea, which could be done 50 years earlier. Admittedly the Higgs boson is still a "complex idea", but the bottleneck still was the actual testing.

Agreed - AI that could take care of this sort of cross-system complexity and automation in a reliable way would be actually useful. Unfortunately I've yet to use an AI that can reliably handle even moderately complex text parsing in a single file more easily than if I'd just done it myself from the start.

Yes. It’s very frustrating. Like there is a great need for a kind of data pipeline test suite where you can iterate through lots of different options and play around with different data manipulations so a single person can do it. Because it’s not worth it to really build it if it doesn’t work. There needs to be one of these astronomer/dagster/apache airflow/azure ml tools that are quick and dirty to try things out. Maybe I’m just naive and they exist and I’ve had my nose in Jupyter notebooks. But I really feel hindered these days in my ability to prototype complex data pipelines myself while also considering all of the other parts of the science.

This reminds me of a paper: "The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators"

https://arxiv.org/abs/2407.11004

In essence, LLMs are quite good at writing the code to properly parse large amounts of unstructured text, rather than what a lot of people seem to be doing which is just shoveling data into an LLM's API and asking for transformations back.

> I don’t need an ai to connect across ideas or come up with new hypothesis.

This feels like hubris to me. The idea here isn't to assist you with menial tasks, the idea is to give you an AI generalist that might ne able to alert you to things outside of your field that may be related to your work. It's not going to reduce your workload, in fact, it'll probably increase it but the result should be better science.

I have a lot more faith in this use of LLMs than I do for it to do actual work. This would just guide you to speak with another expert in a different field and then you take it from there.

> In many fields, this presents a breadth and depth conundrum, since it is challenging to navigate the rapid growth in the rate of scientific publications while integrating insights from unfamiliar domains.

> This feels like hubris to me.

No, any scientist has hundreds of ideas they would like to test. It's just part of the job. The hard thing is to do the rigorous testing itself.

>The hard thing is to do the rigorous testing itself.

This. Rigorous testing is hard and it requires a high degree of intuition and intellectual humility. When I'm evaluating something as part of my resaerch, I'm constantly asking: "Am I asking the right questions?" "Am I looking at the right metrics?" "Are the results noisy, to what extent, and how much does it matter?" and "Am I introducing confounding effects?" It's really hard to do this at scale and quickly. It necessarily requires slow measured thought, which computers really can't help with.

I have a billion ideas, being able to automate the testing of those ideas in some kind of Star Trek talk to the computer and it just knows what you want way would be perfect. This is the promise of ai. This is the promise of a personal computer. It is a bicycle for your mind. It is not hubris to want to be able to iterate more quickly on your own ideas. It is a natural part of being a tool building species.

> the idea is to give you an AI generalist that might ne able to alert you to things outside of your field that may be related to your work

That might be a good goal. It doesn't seem to be the goal of this project.

The market seems excited to charge in whatever direction the weathervane has last been pointing, regardless of the real outcomes of running in that direction. Hopefully I’m wrong, but it reminds me very much of this study (I’ll quote a paraphrase)

“A groundbreaking new study of over 1,000 scientists at a major U.S. materials science firm reveals a disturbing paradox: When paired with AI systems, top researchers become extraordinarily more productive – and extraordinarily less satisfied with their work. The numbers tell a stark story: AI assistance helped scientists discover 44% more materials and increased patent filings by 39%. But here's the twist: 82% of these same scientists reported feeling less fulfilled in their jobs.”

Quote from https://futureofbeinghuman.com/p/is-ai-poised-to-suck-the-so...

Referencing this study https://aidantr.github.io/files/AI_innovation.pdf

As a dev, I have the same experience.

AI chat is a massive productivity enhancer, but, when coding via prompts, I'm not able to hit the super satisfying developer flow state that I get into via normal coding.

Copilot is less of a productivity boost, but also less of a flow state blocker.

Yep! I think these tools are incredibly useful, but I think they're basically changing all our jobs to be more like what product managers do, having ideas for what we want to achieve, but farming out a significant chunk of the work rather than doing it ourselves. And that's fine, I find it very hard to argue that it's a bad thing. But there's a reason that we aren't all product managers already. Programming is fun, and I do experience it as a loss to find myself doing less of it myself.

There is some queasy feeling of fake-ness when auto-completing so much code. It feels like you're doing something wrong. But these are all based on my experience coding for half my life. AI-native devs will probably feel differently.

I'm a bit skeptical of this study given how it is unpublished, from a (fairly junior) single author and all of the underlying details of the subject are redacted. Is there any information anywhere about what this company in the study was actually doing? (the description in the article are very vague -- basically something to do with materials)

Definitely interesting, but I'm not so sure that such a study can yet make strong claims about AI-based work in general.

These are scientists that have cultivated a particular workflow/work habits over years, even decades. To a significant extent, I'm sure their workflow is shaped by what they find fulfilling.

That they report less fulfillment when tasked with working under a new methodology, especially one that they feel little to no mastery over, is not terribly surprising.

The feeling of dissatisfaction is something I can relate to. My story:

I only recently started using aider[1].

My experience with it can be described in 3 words.

Wow!

Oh wow!

It was amazing. I was writing a throwaway script for one time use (not for work). It wrote it for me in under 15 minutes (this includes my time getting familiar with the tool!) No bugs.

So I decided to see how far I could take it. I added command line arguments, logging, and a whole bunch of other things. After a full hour, I had a production ready script - complete with logs, etc. I had to debug code only once.

I may write high quality code for work, but for personal throwaway scripts, I'm sloppy. I would not put a command line parser, nor any logging. This did it all for me for very cheap!

There's no going back. For simple scripts like this, I will definitely use aider.

And yeah, there was definitely no satisfaction one would derive from coding. It was truly addictive. I want to use it more and more. And no matter how much I use it and like the results, it doesn't scratch my programmer's itch. It's nowhere near the fun/satisfaction of SW development.

[1] https://aider.chat/

I tried Aider recently to modify a quite small python + HTML project, and it consistently got "uv" commands wrong, ended up changing my entire build system because it didn't think the thing I wanted to do was supported in the current one (it was).

They're very effective at making changes for the most part, but boy you need to keep them on a leash if you care about what those changes are.

It seems in general we’re heading toward’s Minsky’s society of minds concept. I know OpenAI is wanting to collapse all their models into a single omni model that can do it all, but I wonder if under the hood it’d just be about routing. It’d make sense to me for agents to specialize in certain tool calls, ways of thinking, etc that as a conceptual framework/scaffolding provides a useful direction.

Also, for some more complex questions I’ve noticed that it doesn’t expose its reasoning. Specifically, yesterday I asked it to perform a search algorithm provided a picture of a grid, and it reasoned for 1-2 minutes but didn’t show any of it (neither in real time nor afterwords), whereas for simpler questions I’ve asked it the reasoning is provided as well. Not sure what this means, but it suggests some type of different treatment based on complexity.

"conceptual framework" can actually be another generalist model. Splitting model also comes with some advantages. Like easy separate tuning and replacements. Easy scaling by simply duplicating heavily used model on new hardware.

I am generally down on AI these days but I still remember using Eliza for the first time.

I think I could accept an AI prompting me instead of the other way around. Something to ask you a checklist of problems and how you will address them.

I’d also love to have someone apply AI techniques to property based testing. The process of narrowing down from 2^32 inputs to six interesting ones works better if it’s faster.

So I'm a biomedical scientist (in training I suppose...I'm in my 3rd year of a Genetics PhD) and I have seen this trend a couple of times now where AI developers tout that AI will accelerate biomedical discovery through a very specific argument that AI will be smarter and generate better hypotheses than humans.

For example in this Google essay they make the claim that CRISPR was a transdisciplinary endeavor, "which combined expertise ranging from microbiology to genetics to molecular biology" and this is the basis of their argument that an AI co-scientist will be better able to integrate multiple fields at once to generate novel and better hypothesis. For one, what they fail to understand as computer scientists (I suspect due to not being intimately familiar with biomedical research) is that microbio/genetics/mol bio are closer linked than you may expect as a lay person. There is no large leap between microbiology and genetics that would slow down someone like Doudna or even myself - I use techniques from multiple domains in my daily work. These all fall under the general broad domain of what I'll call "cellular/micro biology". As another example, Dario Amodei from Claude also wrote something similar in his essay Machines of Loving Grace that the limiting factor in biomedical is a lack of "talented, creative researchers" for which AI could fill the gap[1].

The problem with both of these ideas is that they misunderstand the rate-limiting factor in biomedical research. Which to them is a lack of good ideas. And this is very much not the case. Biologists have tons of good ideas. The rate limiting step is testing all these good ideas with sufficient rigor to either continue exploring that particular hypothesis or whether to abandon the project for something else. From my own work, the hypothesis driving my thesis I came up with over the course of a month or two. The actual amount of work prescribed by my thesis committee to fully explore whether or not it was correct? 3 years or so worth of work. Good ideas are cheap in this field.

Overall I think these views stem from field specific nuances that don't necessarily translate. I'm not a computer scientist, but I imagine that in computer science the rate limiting factor is not actually testing out hypothesis but generating good ones. It's not like the code you write will take multiple months to run before you get an answer to your question (maybe it will? I'm not educated enough about this to make a hard claim. In biology, it is very common for one experiment to take multiple months before you know the answer to your question or even if the experiment failed and you have to do it again). But happy to hear from a CS PhD or researcher about this.

All this being said I am a big fan of AI. I try and use ChatGPT all the time, I ask it research questions, ask it to search the literature and summarize findings, etc. I even used it literally yesterday to make a deep dive into a somewhat unfamiliar branch of developmental biology more easy (and I was very satisfied with the result). But for scientific design, hypothesis generation? At the moment, useless. AI and other LLMs at this point are a very powerful version of google and code writer. And it's not even correct 30% of the time to boot so you have to be extremely careful when using it. I do think that wasting less time exploring hypotheses that are incorrect or bad is a good thing. But the problem here is that we can pretty easily identify good and bad hypotheses already. We don't need AI for that, what takes time is the actual amount of testing of these hypotheses that slows down research. Oh and politics, which I doubt AI can magic away for us.

[1] https://darioamodei.com/machines-of-loving-grace#1-biology-a...

It's pretty painful watching CS try to turn biology into an engineering problem.

It's generally very easy to marginally move the needle in drug discovery. It's very hard to move the needle enough to justify the cost.

What is challenging is culling ideas, and having enough SNR in your readouts to really trust them.

> It's generally very easy to marginally move the needle in drug discovery. It's very hard to move the needle enough to justify the cost.

Maybe this kind of AI-based exploration would lower the costs. The more something is automated, the cheaper it should be to test many concepts in parallel.

A med chemist can sit down with a known drug, and generate 50 analogs in LiveDesign in an afternoon. One of those analogs may have less CYP inhibition, or better blood brain barrier penetration, or slightly higher potency or something. Or maybe they use an enumeration method and generate 50k analogs in one afternoon.

But no one is going to bring it to market because it costs millions and millions to synthesize, get through PK, ADMET, mouse, rat and dog tox, clinicals, etc. And the FDA won't approve marginal drugs, they need to be significantly better than the SoC (with some exceptions).

Point is, coming up with new ideas is cheap, easy, and doesn't need help. Synthesizing and testing is expensive and difficult.

The one model that would actually make a huge difference in pharma velocity is one that takes a target (protein that causes disease or whatever), a drug molecule (the putative treatment for the disease), and outputs the probability the drug will be approved by the FDA, how much it will cost to get approved, and the revenue for the next ten years.

If you could run that on a few thousand targets and a few million molecules in a month, you'd be able to make a compelling argument to the committee that approves molecules to go into development (probability of approval * revenue >> cost of approval)

Interesting set of comments. Personally - fantastic! It's a co-scientist and not a "scientist". There's enormous value in reviewing work and ranking "what" might provide some interesting output. A lot of ideas are not even considered because they aren't "common" because their components are expensive. If there's a "reasonable expectation" then there's a lower risk of failure. I'm not a scientist "anymore" but I'd love to play with this and see what odd combinations might potentially produce.

Loosely connected: I read recently about an Austrian pilot project in Wien, where they analyze patients' tumors and let AI suggest treatment. They had some spectacular successes, where the AI would recommend drugs that aren't normally used for that type of cancer, but when deployed, worked well.

I read the scientist quote in a newspaper article first and the surprise seemed to hinge on his entire team working on the problem for a decade and not publishing anything in a way that AI could gobble it up (which seemed strange to me) and no other human researcher working on the problem over that decade-ish timespan publishing anything suggesting the same idea.

Which seems a hard thing to disprove.

In which case, if some rival of his had done the same search a month earlier, could he have claimed the priority? And would the question of whether the idea had leaked then been a bit more salient to him. (Though it seems the decade of work might be the important bit, not the general idea).

I recently ran across this toaster-in-dishwasher article [1] again and was disappointed that the LLMs I have access to could replicate the "hairdryer-in-aquarium" breakthrough (or the toaster-in-dishwasher scenario, although I haven't explored it as much), which has made me a bit skeptical of the ability of LLMs to do novel research. Maybe the new OpenAI research AI is smart enough to figure it out?

[1] https://jdstillwater.blogspot.com/2012/05/i-put-toaster-in-d...

This is in line with how I've been using AI in my workflow recently. I give it a summary of my findings thus far and ask it to suggest explanations and recommend further tests I should conduct. About 70% of its ideas are dumb and sometimes I need to give it a little extra prompting, but it does spit out ideas that hadn't occurred to me which make sense. Obviously it's not going to replace a knowledgeable human, but as a tool to assist that human it has outperformed some very expensive PhD level consultants.

Oh great.. This looks exactly like some PhD advisors I've heard of. Creating a list of "ideas" and having their lab monkey PhD students work on it. Surefire way to kill joy of discovery and passion for science. Nice going google :). Also, while validating with "in silico" discovery I would like it to be double blind. If I know the idea and its final outcome the prompts I give are vastly different from if I did not.

That's the Quake version of the machine civilization: machines make the decisions, but use chunks of humans to improve their unholy machinery. The alternative Doom version is the opposite: humans make the decisions, but they are blended in an unholy way into the machines.

I don't think generating hypotheses is where AI is useful, I think it's more useful for doing back of napkin (or more serious) calculations, helping to find protocols, sourcing literature, etc. Grunt work basically. Generating hypotheses is the fun, exciting part that I doubt scientists want to outsource to AI.

> I don't think generating hypotheses is where AI is useful,

> Generating hypotheses is the fun, exciting part that I doubt scientists want to outsource to AI

The latter doesn’t imply the former

Just as the invention of writing degraded human memory (before that they memorized whole stories, poems), with the advent of AI, humans will degrade their thinking skills and knowledge in general.

（评论） (comments)

（评论）
(comments)