将 Cython 翻译成 Mojo,首次尝试
Translating Cython to Mojo, a first attempt

原始链接: https://fnands.com/blog/2025/sklearn-mojo-dbscan-inner/

## Mojo 与 Python 互操作:一个 Scikit-learn 实验 最近的更新(Max 发布版 25.4)实现了从 Python 调用 Mojo 代码,引发了人们使用 Mojo 加速 Python 函数的兴趣——目前这通常通过 C/C++ 和 Rust 实现。本次实验探索了将 scikit-learn 的 DBSCAN 算法中的 Cython 内循环翻译成 Mojo,以评估该过程和潜在的性能提升。 由于 Mojo 和 Python/Cython 之间的语法相似性,翻译过程出乎意料地简单。然而,初步测试表明,由于 PythonObject 的处理效率低下,Mojo 比原始 Cython 慢约 800 倍。通过将 NumPy 数组转换为 Mojo Spans,性能得到了显著改善(比 Cython 慢 3 倍),从而可以更有效地访问数据。 尽管有所改进,但 DBSCAN 算法的整体性能并未改变,因为内循环不是主要的瓶颈。作者计划探索翻译 DBSCAN 中更慢、可并行化的部分,并可能利用 GPU 加速。 虽然 Python 互操作仍处于早期阶段,但该实验展示了 Mojo 加速 Python 代码的潜力,尤其是在使用优化数据类型时。一份将常见 Python/NumPy 类型翻译成 Mojo 的“备忘单”将为开发者提供宝贵的资源。最终,使用 Mojo 加速像 scikit-learn 这样广泛使用的库可以使大量用户受益。

## Mojo 与其他方案:一则黑客新闻讨论总结 一则由 [fnands.com 尝试将 Cython 翻译为 Mojo](https://fnands.com/) 引发的黑客新闻讨论,揭示了加速 Python 代码的工具的复杂格局。虽然 Mojo 展现出潜力,尤其是在机器学习领域,但它面临着来自 Cython、Numba、Julia 和 Rust 等成熟方案的竞争。 多位评论员指出,Cython 越来越多地被 Numba 和 Julia 取代,`setup.py` 是一个缺点。Julia 被认为是在科学计算领域中的强有力竞争者,但 Mojo 通过提供比 Python+C++ 孤岛更具组合性的替代方案,构成了潜在威胁。 人们对 Mojo 的成熟度、Linux/服务器关注、缺乏多重分派以及许可问题表示担忧。C++/pybind11 和 Rust/PyO3 也被考虑在内,但 Rust 的库成熟度滞后。一些用户强调了 Mojo 在 SIMD 和 GPU 编程方面的优势。 讨论还涉及实际问题,例如 Cython 对 `pyproject.toml` 的支持以及一些用户在使用 NoScript 启用时遇到的网站可访问性问题。最终,选择取决于项目需求——稳定性倾向于 C++/pybind,而实验性项目可能倾向于 Mojo,Mojo 1.0 的实际时间表大约在 2027 年左右。
相关文章

原文

Ever since I heard about Mojo I (and presumably most other people) thought it would be a good language to speed up functions to be called from Python. Everyone knows that vanilla Python can be slow, but one of the reasons that Python programs can be reasonably fast in practice is because Python often leans on libraries written in more performant languages, predominantly C/C++, but increasingly also Rust.

Until recently, there has been no real way to call Mojo code from Python, but about a month ago (in Max release 25.4) the ability to call Mojo from Python was added as a beta feature. It’s not fully cooked yet, and it will likely still change a lot, but I wanted to give it a look just to get an idea of where things are heading.

One specific idea that I had when I heard about Mojo was that Mojo might be a good replacement for Cython and apparently I was not the only one to have had this thought:

Convergent thinking on HN

The comments are from the HackerNews discussion on Vincent Warmerdam’s blog post titled “Python can run Mojo now” which made it to the front page of HN a while ago.

So where can I find a lot of Cython code?

Scikit-learn

Scikit-learn implements a bunch of machine learning algorithms and related utilities, and makes heavy use of Cython. How hard would it be to translate some of the Cython code in scikit-learn to Mojo?

I wanted a piece of code that was relatively simple, both just as I didn’t want to jump into the deep end, but also because there are some restrictions on Mojo functions being called from Python, namely (from the known limitations section of the Mojo/Python interop):

Functions taking more than 3 arguments. Currently PyTypeBuilder.add_function() and related function bindings only support Mojo functions that take up to 3 PythonObject arguments: fn(PythonObject, PythonObject, PythonObject).

A simple case: dbscan_inner

An example I found that satisfies this criteria is the inner loop of DBSCAN that assigns points to clusters. It’s relatively short and takes exactly three arguments.

This is a classic case of a place where you would usually want to call speed up a tight inner loop in Python, in this case written in Cython:

HDBSCAN):

dbscan Cython vs Mojo

The performance is identical (lines overlap almost exactly), and it’s the other parts of DBSCAN, like the neighborhood calculation, that take up the majority of the time:

Mojmelo which is effectively the Mojo ecosystem’s answer to scikit-learn, however, almost no-one uses Mojo just yet.

On the other hand, scikit-learn was downloaded 100 Million times last month, so if you can speed up some of scikit-learn’s algorithms you can have a positive impact for a lot of users.

联系我们 contact @ memedata.com