![]() |
|
![]() |
|
Naive question: do you expect the linear scaling to hold with those optimisations to single core performance, or would performance diverge from linear there pending further research advancements?
|
![]() |
|
I think people might interpret something claiming to be the "Future of Parallel Computing" as something that is just waiting on adoption. Perhaps "Towards the Future of Parallel Computing"...
|
![]() |
|
I'm astounded by the level of ignorance of basic CS101 principles shown in this thread. You've clearly never taken a class on parallel computing or complexity theory? Geez. HN is decaying.
|
![]() |
|
Chapel has a decent use in HPC. Also NVidia has sponsored variants of Haskell, .NET, Java, Julia on CUDA, have a Python JIT and are collaborating with Mojo folks. |
![]() |
|
I would be pretty appreciated if people criticize my project. That is how you grow. If people tend hide cruel truth behind applause, the world would just crumbled.
|
![]() |
|
>> Bend has no tail-call optimization yet. I've never understood the fascination with tail calls and recursion among computer science folks. Just write a loop, it's what it optimises to anyway. |
![]() |
|
This is incredible. This is the kind of work we need to crack open the under utilized GPUs out there. I know LLMs are all the rage, but there's more gold in them hills.
|
![]() |
|
Been watching your development for a while on Twitter. This is a monumental achievement and I hope it gets the recognition it deserves.
|
![]() |
|
This is really, really cool. This makes me think, "I could probably write a high performance GPU program fairly easily"...a sentence that's never formed in my head.
|
![]() |
|
This is very exciting. I don’t have any GPU background, but I have been worrying a lot about CUDA cementating itself in the ecosystem. Here devs don’t need CUDA directly which would help decoupling the ecosystem from cynical mega corps, always good! Anyway enough politics.. Tried to see what the language is like beyond hello world and found the guide[1]. It looks like a Python and quacks like a Haskell? For instance, variables are immutable, and tree-like divide and conquer data structures/algorithms are promoted for getting good results. That makes sense I guess! I’m not surprised to see a functional core, but I’m surprised to see the pythonic frontend, not that it matters much. I must say I highly doubt that it will make it much easier for Python devs to learn Bend though, although I don’t know if that’s the goal. What are some challenges in programming with these kind of restrictions in practice? Also, is there good FFI options? [1]: https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md |
![]() |
|
> CPU, Apple M3 Max, 1 thread: 3.5 minutes > CPU, Apple M3 Max, 16 threads: 10.26 seconds Surprised to see a more than linear speedup in CPU threads. What’s going on here? |
![]() |
|
This seems pretty cool! Question: Does this take into account memory bandwidth and caches between cores? Because getting them wrong can easily make parallel programs slower than sequential ones. |
![]() |
|
Erlang-like actor models would be well suited, so yeah, you could use it for web servers (assuming they are able to finish the language). It's a general purpose high level programming language.
|
![]() |
|
What's going on with the super-linear speedup going from one thread to all 16? 210 seconds (3.5 minutes) to 10.5 seconds is a 20x speedup, which isn't really expected. |
I appreciate this is early days, but it's hard to get excited about what seems to be incredibly slow performance from a really simple example you give. If the simple stuff is slow, what does that mean for the complicated stuff?
If I get a chance tonight, I'll re-run it with `-s` argument, see if I get anything helpful.
under pypy3 it executes in 0m4.478s, single threaded. Under python 3.12, it executed in 1m42.148s, again single threaded. I mention that because you include benchmark information: The bend single-threaded version has been running for 42 minutes on my laptop, is consuming 6GB of memory, and still hasn't finished (12th Gen Intel(R) Core(TM) i7-1270P, Ubuntu 24.04). That seems to be an incredibly slow interpreter. Has this been tested or developed on anything other than Macs / aarch64?