(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=41448439

Ligo Bioscience 的创始人已开放了 AlphaFold3(一种尖端蛋白质结构预测模型)的开源实现。 该模型最初由 Google DeepMind 创建,对于加速蛋白质结构的绘制和改进药物发现具有重要意义。 今年 5 月,DeepMind 在没有附带代码的情况下公布了 AlphaFold3,引起了人们对可重复性的担忧,并引发了科学界的批评。 现在,Ligo Biosciences 发布了 AlphaFold3 模型的第一部分,能够预测蛋白质结构,以及必要的训练代码。 稍后将对其余功能进行进一步增强。 为了鼓励协作,他们采用了 Apache 2.0 许可证。 该项目背后的团队在实施 AlphaFold3 时遇到了一些挑战,例如损失函数缩放和原始出版物中残留层遗漏之间的差异。 有关这些问题和解决方案的更多详细信息可以在 GitHub 上找到。 Y Combinator 旗下公司 Ligo Biosciences 与 OpenFold、 ProteinFlow 和 Basecamp Research 等合作伙伴共同构建了这个开源版本,并与 Matt Clark 等才华横溢的个人合作,实现了令人惊叹的可视化效果。 如需更新和见解,请关注 Twitter 上正在进行的讨论。

相关文章

原文
Hi HN - we’re the founders of Ligo Biosciences and are excited to share an open-source implementation of AlphaFold3, the frontier model for protein structure prediction.

Google DeepMind and their new startup Isomorphic Labs, are expanding into drug discovery. They developed AlphaFold3 as their model to accelerate drug discovery and create demand from big pharma. They already signed Novartis and Eli Lilly for $3 billion - Google’s becoming a pharma company! (https://www.isomorphiclabs.com/articles/isomorphic-labs-kick...)

AlphaFold3 is a biomolecular structure prediction model that can do three main things: (1) Predict the structure of proteins; (2) Predict the structure of drug-protein interactions; (3) Predict nucleic acid - protein complex structure.

AlphaFold3 is incredibly important for science because it vastly accelerates the mapping of protein structures. It takes one PhD student their entire PhD to do one structure. With AlphaFold3, you get a prediction in minutes on par with experimental accuracy.

There’s just one problem: when DeepMind published AlphaFold3 in May (https://www.nature.com/articles/s41586-024-07487-w), there was no code. This brought up questions about reproducibility (https://www.nature.com/articles/d41586-024-01463-0) as well as complaints from the scientific community (https://undark.org/2024/06/06/opinion-alphafold-3-open-sourc...).

AlphaFold3 is a fundamental advance in structure modeling technology that the entire biotech industry deserves to be able to reap the benefits from. Its applications are vast, including:

- CRISPR gene editing technologies, where scientists can see exactly how the DNA interacts with the scissor Cas protein;

- Cancer research - predicting how a potential drug binds to the cancer target. One of the highlights in DeepMind’s paper is the prediction of a clinical KRAS inhibitor in complex with its target.

- Antibody / nanobody to target predictions. AlphaFold3 improves accuracy on this class of molecules 2 fold compared to the next best tool.

Unfortunately, no companies can use it since it is under a non-commercial license!

Today we are releasing the full model trained on single chain proteins (capability 1 above), with the other two capabilities to be trained and released soon. We also include the training code. Weights will be released once training and benchmarking is complete. We wanted this to be truly open source so we used the Apache 2.0 license.

Deepmind published the full structure of the model, along with each components’ pseudocode in their paper. We translated this fully into PyTorch, which required more reverse engineering than we thought!

When building the initial version, we discovered multiple issues in DeepMind’s paper that would interfere with the training - we think the deep learning community might find these especially interesting. (Diffusion folks, we would love feedback on this!) These include:

- MSE loss scaling differs from Karras et al. (2022). The weighting provided in the paper does not downweigh the loss at high noise levels.

- Omission of residual layers in the paper - we add these back and see benefits in gradient flow and convergence. Anyone have any idea why Deepmind may have omitted the residual connections in the DiT blocks?

- The MSA module, in its current form, has dead layers. The last pair weighted averaging and transition layers cannot contribute to the pair representation, hence no grads. We swap the order to the one in the ExtraMsaStack in AlphaFold2. An alternative solution would be to use weight sharing, but whether this is done is ambiguous in the paper.

More about those issues here: https://github.com/Ligo-Biosciences/AlphaFold3

How this came about: we are building Ligo (YC S24), where we are using ideas from AlphaFold3 for enzyme design. We thought open sourcing it was a nice side quest to benefit the community.

For those on Twitter, there was a good thread a few days ago that has more information: https://twitter.com/ArdaGoreci/status/1830744265007480934.

A few shoutouts: A huge thanks to OpenFold for pioneering the previous open source implementation of AlphaFold We did a lot of our early prototyping with proteinFlow developed by Lisa at AdaptyvBio we also look forward to partnering with them to bring you the next versions! We are also partnering with Basecamp Research to supply this model with the best sequence data known to science. Matthew Clark (https://batisio.co.uk) for his amazing animations!

We’re around to answer questions and look forward to hearing from you!

联系我们 contact @ memedata.com