Why choose async/await over threads?

Animats · 2024-03-25T07:29:26

Async/await with one thread is simple and well-understood. That's the Javascript model. Threads let you get all those CPUs working on the problem, and Rust helps you manage the locking. Plus, you can have threads at different priorities, which may be necessary if you're compute-bound.

Multi-threaded async/await gets ugly. If you have serious compute-bound sections, the model tends to break down, because you're effectively blocking a thread that you share with others.

Compute-bound multi-threaded does not work as well in Rust as it should. Problems include:

- Futex congestion collapse. This tends to be a problem with some storage allocators. Many threads are hitting the same locks. In particular, growing a buffer can get very expensive in allocators where the recopying takes place with the entire storage allocator locked. I've mentioned before that Wine's library allocator, in a .DLL that's emulating a Microsoft library, is badly prone to this problem. Performance drops by two orders of magnitude with all the CPU time going into spinlocks. Microsoft's own implementation does not have this problem.

- Starvation of unfair mutexes. Both the standard Mutex and crossbeam-channel channels are unfair. If you have multiple threads locking a resource, doing something, unlocking the resource, and repeating that cycle, one thread will win repeatedly and the others will get locked out.[1] If you need fair mutexes, there's "parking-lot". But you don't get the poisoning safety on thread panic that the standard mutexes give you.

If you're not I/O bound, this gets much more complicated.

[1] https://users.rust-lang.org/t/mutex-starvation/89080

exfalso · 2024-03-25T07:45:20

Yes, 100%.

I've mostly only dealt with IO-bound computations, but the contention issues arise there as well. What's the point of having a million coroutines when the IO throughput is bounded again? How will coroutines save me when I immediately exhaust my size 10 DB connection pool? It won't, it just makes debugging and working around the issues harder and difficult to reason about.

AstralStorm · 2024-03-25T11:06:38

The debugging issue is bigger than what it seems.

Use of async/await model in particular ends up with random hanged micro tasks in some random place in code that are very hard to trace back to the cause because they're dispersed potentially anywhere.

Concurrency is also rather undefined, as are priorities most of the time.

This can be partly fixed by labelling, which adds more complexity, but at least is explicit. Then the programmer needs to know what to label... Which they won't do, and Rust has no training wheels to help with concurrency.

Threads, well you have well defined ingress and egress. Priorities are handled by the OS and to some degree fairness is usually ensured.

temporarely · 2024-03-25T12:12:49

Just morning bathroom musings based on your posts (yep /g) and this got me thinking maybe the robust solution (once and for all for all languages) may require a rethink at the hardware level. The CPU bound issue comes down to systemic interrupt/resume I think; if this can be done fairly for n wip thread-of-execution with efficient queued context swaps (say maybe a cpu with n wip contexts) then the problem becomes a resource allocation issue. Your thoughts?

Animats · 2024-03-25T17:48:51

> a cpu with N wip contexts

That's what "hyper-threading" is. There's enough duplicated hardware that beyond 2 hyper-threads, it seems to be more effective to add another CPU. If anybody ever built a 4-hyperthread CPU, it didn't become a major product.

It's been tried a few times in the past, back when CPUs were slow relative to memory. There was a National Semiconductor microprocessor where the state of the CPU was stored in main memory, and, by changing one register, control switched to another thread. Going way back, the CDC 6600, which was said to have 10 peripheral processors for I/O, really had only one, with ten copies of the state hardware.

Today, memory is more of a bottleneck that the CPU, so this is not a win.

zozbot234 · 2024-03-25T18:34:00

The UltraSPARC T1 had 4-way SMT, and its successors bumped that to 8-way. Modern GPU compute is also highly based on hardware multi-threading as a way of compensating for memory latency, while also having wide execution units that can extract fine-grained parallelism within individual threads.

nine_k · 2024-03-25T18:48:43

Also, IBM POWER have SMT at levels above 2; at least POWER 7 had 4-way SMT ("hyperthreading").

Animats · 2024-03-25T22:55:25

Missed that. That's part of IBM mainframe technology, where you can have "logical partitions", a cluster on a chip, and assign various resources to each. IBM POWER10 apparently allows up to 8-way hyperthreading if configured that way.

temporarely · 2024-03-25T17:56:53

Thanks, very informative.

binary132 · 2024-03-25T13:42:36

What you said sounded in my head more like you’re describing a cooperatively scheduled OS rather than a novel hardware architecture.

temporarely · 2024-03-25T14:53:02

(This has been a very low priority background thread in my head this morning so cut me some slack on hand waving.)

Historically, the H/W folks addressed (pi) memory related architectural changes, such as when multicore came around and we got level caches. Imagine if we had to deal at software level with memory coherence in different cores [down to the fundamental level of invalidating Lx bytes]. There would be NUMA like libraries and various hacks to make it happen.

Arguably you could say "all that is in principle OS responsibility even memory coherence across cores" and we're done. Or you would agree that "thank God the H/W people took care of this" and ask can they do the same for processing?

The CPU model afaik hasn't changed that much in terms of granularity of execution steps whereas the H/W people could realize that d'oh an execution granularity in conjunction with hot context switching mechanism, could really help the poor unwashed coders in efficiently executing multiple competing sequences of code (which is all they know about at H/W level).

If your CPU's architecture specs n+/-e clock ticks per context iteration, then you compile for that and you design languages around that. CPU bound now becomes heavy CPU usage but is not a disaster for any other process sharing the machine with you. It becomes a matter of provisioning instead of programming ad-hoc provisioning ..

binary132 · 2024-03-25T16:14:22

If our implementations are bad because of preemption, then I’m not sure why the natural conclusion isn’t “maybe there should be less preemption” instead of “[even] more of the operating system should be moved into the hardware”.

marcosdumay · 2024-03-25T15:38:52

If you have fewer threads ready to run than CPU cores, you never have any good reason to interrupt one of them.

sporkland · 2024-03-25T23:37:23

I don't know how the challenges with cooperative (no-preemptive) multitasking keep needing to get rediscovered. Even golang, which I consider a very responsibly designed language, went with cooperative at first until they were forced to switch to preemptive. Not saying cooperative multitasking doesn't have its place, just that it's gotta have a warning sticker or even better disallow certain types of code from executing statically.

Also great time to plug a related post, What color is your function:

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

miohtama · 2024-03-25T15:20:35

Also there is the extra software development and maintenance cost due to coloured functions that async/await causes

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

Unless you are doing high scalability software, async might not be worth of the trade offs.

nine_k · 2024-03-25T18:55:00

If you expect to color your functions async by default, it's really easy to turn a sync functuion into a near-zero-cost async, a Future that has already been resolved at construction time by calling the sync function.

This way, JS / TS becomes pretty comfortable. Except for stacktraces, of course.

ngrilly · 2024-03-26T11:34:48

Useful stacktraces is one of the reasons why I use Go instead of JS/TS on the server-side.

yencabulator · 2024-03-25T19:36:53

Except when your sync function calls a function that makes the async runtime deadlock. No, there's no compiler error or lint for that. Good luck!

MuffinFlavored · 2024-03-25T17:20:22

> Async/await with one thread is simple and well-understood. That's the Javascript model.

Bit of nuance (I'm not an authority of this and I don't know the current up-to-date answer across 'all' of the Javascript runtimes these days):

Isn't it technically "the developer is exposed the concept of just one thread without having to worry about order of execution (other than callbacks/that sort of pre-emption)" but under the hood it can actually be using as many threads as it wants (and often does)? It's just "abstracted" away from the user.

recursive · 2024-03-25T17:37:35

I don't think so. Two threads updating the same variable would be observable in code. There's no way around it.

MuffinFlavored · 2024-03-25T17:43:55

I'm referring to something like this: https://stackoverflow.com/questions/7018093/is-nodejs-really...

It's like a pedantic technical behind the scenes point I think, just trying to learn "what's true"

recursive · 2024-03-25T21:45:10

Looks like the spec refers to the thing-that-has-the-thread as an "agent". https://tc39.es/ecma262/#sec-agents

I don't know the details about how any implementation of a javascript execution environment allows for the creation of new agents.

kaba0 · 2024-03-25T17:56:12

I mean, yeah, if you go deep enough your OS may decide to schedule your browser thread to a different core as well. I don’t think it has any relevance here - semantically, it is executed on a single thread, which is very different from multi-threading.

MuffinFlavored · 2024-03-25T18:50:22

It is executed by your runtime which may or may not behind the scenes be using a single thread for your execution and/or the underlying I/O/eventing going on underneath, no?

throwitaway1123 · 2024-03-26T01:22:33

Most JS runtimes are multi-threaded behind the scenes. If you start a node process:

  node -e "setTimeout(()=>{}, 10_000)" &

Then wait a second and run:

  ps -o thcount $!

Or on macOS:

  ps -M $!

You'll see that there are multiple threads running, but like others have said, it's completely opaque to you as a programmer. It's basically just an implementation detail.

HarHarVeryFunny · 2024-03-25T15:19:45

I'm not sure what you mean by "multi-threaded async/wait"... Isn't the article considering async/await as an alternative to threads (i.e. coroutines vs threads)?

I'm a C++ programmer, and still using C++17 at work, so no coroutines, but don't futures provide a similar API? Useful for writing async code in serialized fashion that may be easier (vs threads) to think about and debug.

Of course there are still all the potential pitfalls that you enumerate, so it's no magic bullet for sure, but still a useful style of programming on occasion.

zarzavat · 2024-03-25T16:10:17

They mean async/await running over multiple OS threads compared to over one OS thread.

You can also have threads running on one OS thread (Python) or running on multiple OS threads (everything else).

Every language’s concurrency model is determined by both a concurrency interface (callbacks, promises, async await, threads, etc), and an implementation (single-threaded, multiple OS threads, multiple OS processes).

lights0123 · 2024-03-25T15:42:03

async/await tasks can be run in parallel on multiple threads, usually no more threads than there are hardware threads. This allows using the full capabilities of the machine, not just one core's worth. In a server environment with languages that support async/await but don't have the ability to execute on multiple cores like Node.js and Python, this is usually done by spawning many duplicate processes and distributing incoming connections round-robin between them.

_flux · 2024-03-25T08:05:10

Jemalloc can use separate arenas for different threads which I imagine mostly solves the futex congestion issue. Perhaps it introduces new ones?

gpderetta · 2024-03-25T12:22:18

IIRC glibc default malloc doesn't use per-thread arenas as they would waste too much memory on programs spawning tens of thousands of threads, and glibc can't really make too many workload assumptions. Instead I think it uses a fixed pool of arenas and tries to minimize contention.

These days on Linux, with restartable sequences, you can have true per-cpu arenas with zero contention. Not sure which allocator use them though.

celrod · 2024-03-25T14:56:02

> As pressure from thread collisions increases, additional arenas are created via mmap to relieve the pressure. The number of arenas is capped at eight times the number of CPUs in the system (unless the user specifies otherwise, see mallopt), which means a heavily threaded application will still see some contention, but the trade-off is that there will be less fragmentation.

https://sourceware.org/glibc/wiki/MallocInternals

So glibc's malloc will use up to 8x #CPUs arenas. If you have 10_000 threads, there is likely to be contention.

JonChesterfield · 2024-03-25T12:41:04

https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree...

Thank you, I didn't know about this one. An allocator that seems to use it at https://google.github.io/tcmalloc/rseq.html

jeffbee · 2024-03-25T14:57:27

Not only does tcmalloc use rseq, the feature was contributed to Linux by the tcmalloc authors, for this purpose, among other purposes.

neonsunset · 2024-03-25T10:41:28

I keep having to repeat it on this unfortunate website: it's an implementation detail, a multi-threaded executor of async/await can cope with starvation perfectly well as demonstrated by .NET's implementation that can shrug off some really badly written code that interleaves blocking calls with asynchronous.

https://news.ycombinator.com/item?id=39530435

https://news.ycombinator.com/item?id=39786142

https://news.ycombinator.com/item?id=39721626

AstralStorm · 2024-03-25T11:11:07

Said badly with code will still execute with poor performance and the mix can be actively hard to spot.

maccard · 2024-03-25T13:02:24

Badly written threaded code will have the same problem, unfortunately

neonsunset · 2024-03-25T11:29:52

You would be surprised. It ultimately regresses to "thread per request but with extra steps". I remember truly atrocious codebases that were spamming task.Result everywhere and yet there were performing tolerably even back on .NET Framework 4.6.1. The performance has (and often literally) improved ten-fold since then, with threadpool being rewritten, its hill-climbing algorithm receiving further tuning and it getting proactive blocked workers detection that can inject threads immediately without going through hill-climbing.

klysm · 2024-03-25T14:20:22

I find this has worked well for me when I can easily state what thread pool work gets executed on.

mmis1000 · 2024-03-26T01:37:56

I think its kotlin's model. The language don't have a default co-routine executor set, you need to provide your own or spawn one with the std. It can be thread-pooled, single thread, or custom one but there isn't a default one set.

If you use single threaded executor, then race condition won't happen. If you choose pooled, then you obviously should realize there can be a race condition. It's all about choice you made.

jmspring · 2024-03-25T08:07:19

You focus on rust rather than generalizing...

If you are IO bound, consider threads. This is almost the same as async / await.

What was missing above, and the problem with how most compute education is these days, if you are compute bound you need to think about processes.

If you were dealing with python concurrent.futures, you would need to consider processpooexecutor vs. threadpoolexecutor.

Threadpoolexecutor gives you the same as the above.

With multiprocessor executor, you will have multiple processes executing independently but you have to copy a memory space. Which people don't consider. In python DS work - multiprocessor workloads need to determine memory space considerations.

It's kinda f'd up how JS doesn't have engineers think about their workloads.

initplus · 2024-03-25T11:02:42

I think you are coming at this from a particular Python mindset, driven by the limitations imposed on Python threading by the GIL. This is a peculiarity specific to Python rather than a general purpose concept about threads vs processes.

danbruc · 2024-03-25T10:02:03

[...] if you are compute bound you need to think about processes.

How would that help? Running several processes instead of several threads will not speed anything up [1] and might actually slow you down because of additional inter-process communication overhead.

[1] Unless we are talking about running processes across multiple machines to make use of additional processors.

zelphirkalt · 2024-03-25T10:21:44

I think you need to clarify what you mean by "thread". For example they are different things when we compare Python and Java Threads. Or OS threads and green threads. I think the GP was relating to OS threads.

danbruc · 2024-03-25T10:52:39

I was also referring to kernel threads. If we are talking about non-kernel threads, then sure, a given implementation might have limitations and there might be something to be gained by running several processes, but that would be a workaround for those limitations. But for kernel threads there will generally be no gain by spreading them across several processes.

josephg · 2024-03-25T11:27:36

Right; a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations. As I understand it, the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads.

But I suspect it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes. And of course, that way there’s no need for IPC.

All things being equal, you should get more performance out of a single process with a lot of threads than a lot of individual processes.

Galanwe · 2024-03-25T13:04:10

> a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

> the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads

Not just the scheduler, the whole kernel really. The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

> it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes

Right, though this is more of a theorical concern than a practical one. If you are sensible to a marginal TLB flush, then you may as well "isolcpu" and set affinities to avoid any context switch at all.

> that way there’s no need for IPC

If you have your processes mmap a shared memory, you effectively share address space between processes just like threads share their address space.

For most intent and purposes, really, I do find multiprocessing just better than multithreading. Both are pretty much indistinguishable, but separate processes give you the flexibility of being able to arbitrarily spawn new workers just like any other process, while with multithreading you need to bake in some form of pool manager and hope to get it right.

Animats · 2024-03-26T04:53:13

> The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

That's from Plan 9. There, you can fork, with various calls, sharing or not sharing code, data, stack, environment variables, and file descriptors.[1] Now that's in Linux. It leads to a model where programs are divided into connected processes with some shared memory. Android does things that way, I think.

[1] https://news.ycombinator.com/item?id=863939

danbruc · 2024-03-25T13:39:34

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

Threads and processes are semantically quite different in standard computer science terminology. A thread has an execution state, i.e. its set of set of processor register values. A process on the other hand is a management and isolation unit for resources like memory and handles.

Hendrikto · 2024-03-25T11:43:28

> If you are IO bound, consider threads. This is almost the same as async / await.

Only in Python.

> if you are compute bound you need to think about processes.

Also only in Python.

cryptonector · 2024-03-25T15:20:24

If you're using threads then consider not using Python. Or, just consider not using Python.

baq · 2024-03-25T09:32:04

Backend JS just spins up another container and/or lambda and if it's too slow and requires multiple CPUs in a single deployment, oh well, too bad.

zelphirkalt · 2024-03-25T10:22:40

That is of course a huge overhead, compared to how other languages solve the problem.

zaphar · 2024-03-25T12:55:14

The complaints around async/await vs threads to my mind have not been that one is more or less complex than the other. It is that it bifurcates the ecosystem and one of them ends up being a second class citizen causing friction when you choose the wrong one for your project.

While you can mix and match them it's hacky and inefficient when you need to. As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem. Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

There is a hypothetical world where Rust used abstractions that are even more composable than async/await, whose composability really wants everything else to be async/await too. If that had happened then I think most of the complaints would have disappeared.

K0nserv · 2024-03-25T13:20:51

I agree with your diagnosis. It's what I concluded in my own Rust async blog post[0](which surely are mandatory now). It's even worse than bifurcating the ecosystem because even within async code it's almost always closely tied to the executor, usually Tokio. I talk about this as an extension to function colouring, adopting without.boats's three colour proposition with blue(non-IO), green(blocking-IO), and red(async-IO). In the extended model it's really blue, green, red(Tokio), purple(async-std), and orange(smol) etc.

I find that the sans-IO pattern is the best solution to this problem. Under this pattern you isolate all blue code and use inversion of control for I/O and time. This way you end up with the core protocol logic being unaware of IO and it becomes simple to wrap it in various forms of IO.

0: https://hugotunius.se/2024/03/08/on-async-rust.html

KiloCNC · 2024-03-25T16:14:12

I love the fact that people outside the Python ecosystem are spreading the word about sans-IO. I think it should be the next iteration in coding using futures-based concurrency. I only wish it were more popular in the Python land as well.

zozbot234 · 2024-03-25T16:53:44

The Haskell folks got there first.

fsociety · 2024-03-25T14:09:08

This is a cool pattern, thanks for the share.

K0nserv · 2024-03-25T14:13:33

No problem, here are some examples of it:

* Quinn(A QUIC implementation, in particular `quinn-proto` is sans-IO, whereas the outer crate, `quinn`, is Tokio-based)[0)

* str0m(A WebRTC implementation that I work on, it's an alternative to `webrtc-rs`. We don't have any IO-aware wrappers, the user is expected to provide that themselves atm)[1]

0: https://github.com/quinn-rs/quinn

1: https://github.com/algesten/str0m/

Kinrany · 2024-03-25T14:08:16

> Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

Only if you have two libraries to choose from and they are otherwise identical, which is rare. Using blocking code in async applications is not as seamless as it should be but not hard. Instead of writing `foo()` you write `tokio::spawn_blocking(foo).await`. It will run the new code in a separate thread and return a future that will resolve once that thread is done.

zaphar · 2024-03-25T14:22:42

That assumes you are using Tokio. As another poster said not only does the ecosystem fragment along the async/non async lines but along the runtime lines. Async is an extremely leaky abstraction. You are in a way making my point for me. If you want to avoid painful refactoring you should basically always start out tokio-async and shim in non async code as needed because going the other way is going to hurt.

Kinrany · 2024-03-25T14:37:43

`spawn_blocking` is a single function and it is not complicated, it shouldn't be too hard for other runtimes to do the same.

The end application does have to choose a runtime anyway and will have to stick with it because this area isn't standardized yet. This problem mostly affects the part of the ecosystem that wants to put complicated concurrency logic into libraries.

cryptonector · 2024-03-25T15:22:21

`spawn_blocking` should be part of a core executor interface that all executors must provide.

Kinrany · 2024-03-25T14:10:52

And of course most libraries don't even need IO because the application can do it for them, so it only makes sense for them to be async if they're computationally heavy enough to cause problems for the runtime.

HideousKojima · 2024-03-25T13:04:24

>As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem.

I mean C# works basically the same way, even through there are non-async options for IO using the async options basically forces you to be async all the way back to Main(). There are ways to safely call async methods from sync methods but they make debugging infinitely harder.

zaphar · 2024-03-25T14:08:19

Well, yes. That doesn't mean it's not annoying though. It happens in every language that provides syntactic support for the distinction between async/await and non async. It's, I think, core to the syntactic and semantic abstractions that were popularized by Javascript.

avodonosov · 2024-03-25T08:24:19

Issues with the article:

1. Only one example is given (web server), solved incorrectly for threads. I will elaborate below.

2. The question is framed as if people specifically want OS threads instead of async/await .

But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words. (The `wait` word remains in it's classic sense: when you wait for an event, for another thread to complete, etc - a blocking operation of waiting. But we don't want it for function invocations).

Back to #1 - the web server example.

When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error? Does socket remain open, remains connected to the client - essentially leaked?

The timeout solution for threaded version may look almost the same, as it looks for async/await: `threaded_race(client_thread, timeout).wait`. This threaded_race function uses a timer to track a timeout in parallel with the thread, and when the timeout is reached it calls `client_thread.interrupt()` - the Java way. (The `Thread.interrupt()`, if thread is not blocked, simply sets a flag; and if the thread is blocked in an IO call, this call throws an InterruptedException. That's a checked exception, so compiler forces programmer to wrap the `client.read_to_end(&mut data)` into try / catch or declare the exception in the `handle_client`. So programmer will not forget to close the client socket).

f_devd · 2024-03-25T09:22:20

> When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error?

Any internal race() values will be `Drop`ed and driver itself will remain (although rust will complain you are not handling the Result if you type it 'as is'), if a new socket was created local to the future it will be cleaned up.

The niceness of futures (in Rust) is that all the behavior around it can be defined, while "all functions are blocking." as you state in a sibling comment, Rust allows you to specify when to defer execution to the next task in the task queue, meaning it will poll tasks arbitrarily quickly with an explicitly held state (the Future struct). This makes it both very fast (compared to threads which need to sleep() in order to defer) and easy to reason about.

Java's Thread.interrupt is also just a sleep loop, which is fine for most applications to be fair. Rust is a system language, you can't have that in embedded systems, and it's not desirable for kernels or low-latency applications.

avodonosov · 2024-03-25T10:45:37

> Java's Thread.interrupt is also just a sleep loop

You probably mean that Java's socket reading under the hood may start a non-blocking IO operation on the socket, and then run a loop, which can react on Thread.interrupt() (which, in turn, will basically be setting a flag).

But that's an implementation detail, and it does not need to be implemented that way.

It can be implemented the same way as async/await. When a thread calls socket reading, the runtime system will take the current threads continuation off the execution, and use CPU to execute the next task in the queue. (That's how Java's new virtual threads are implemented).

Threads and async/await are basically the same thing.

So why not drop this special word `async`?

f_devd · 2024-03-25T11:21:42

> So why not drop this special word `async`?

You can drop the special word in Rust it's just sugar for 'returns a poll-able function with state'; however threads and async/await are not the same.

You can implement concurrency any way you like, you can run it in separate processes or separate nodes if you are willing to put in the work, that does not mean they equivalent for most purposes.

Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Purely from a merit perspective threads are simply a different trade-off. Just like multi-processing and distributed actor model is.

gpderetta · 2024-03-25T12:32:06

> Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Keyword here being almost. See Project Loom.

avodonosov · 2024-03-25T15:12:56

@f_devd, cooperative vs preemptive is a good point.

(That threads are heavy or should be scheduled by OS is not required by the nature of the threads).

But preemptive is strictly better (safer at least) than cooperative, right? Otherwise, one accidental endless loop, and this code occupies the executor, depriving all other futures from execution.

@gpderetta, I think Project Loom will need to become preemptive, otherwise the virtual threads can not be used as a drop-in replacement for native threads - we will have deadlocks in virtual threads where they don't happen in native threads.

f_devd · 2024-03-25T15:43:15

Preemptive is safer for liveliness since it avoids 'starvation' (one task's poll taking too long), however it in practice almost always more expensive in memory and time due to the implicit state.

In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from. And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.

In general threads are a good option if you can afford the overhead, but assuming threads as a default can significantly hinder performance (or make near impossible to even run) where Rust needs to.

Ygg2 · 2024-03-25T12:56:31

Java can afford that. M:N threads come with a heavy runtime. Java has already a heavy runtime, so what is a smidgen more flab?

Source: https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

gpderetta · 2024-03-25T13:14:59

So it seems that the biggest issue was having a single Io interface forcing overhead on both green and native threads and forcing runtime dispatching.

It seems to me that the best would have been to have the two libraries evolve separately and capture the common subset in a trait (possibly using dynamic impl when type erasure is tolerable), so that you can write generic code that can work with both or specialized code to take advantage of specific features.

As it stand now, sync and async are effectively separated anyway and it is currently impossible to write generic code that hande both.

valenterry · 2024-03-25T14:08:44

> In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

It has been tried various times in the last decades. You want to search for "RPC". All attempts at trying to unify sync and async have failed, because there is a big semantical difference between running code within a thread or between threads or even between computers. Trying to abstract over that will eventually be insufficient. So better learn how to do it properly from the beginning.

toast0 · 2024-03-25T16:41:37

I think you've got some of this in your own reply, but ... I feel like Erlang has gone all in on if async is good, why not make everything async. "Everything" in Erlang is built on top of async messaging passing, or the appearance thereof. Erlang hasn't taken over the world, but I think it's still successful; chat services descended from ejabberd have taken over the world; RabbitMQ seems pretty popular, too. OTOH, the system as a whole only works because Erlang can be effectively preemptive in green threads because of the nature of the language. Another thing to note is that you can build the feeling of synchronous calling by sending a request and immediately waiting for a response, but it's vary hard to go the other way. If you build your RPC system on the basis of synchronous calls, it's going to be painful --- sometimes you want to start many calls and then wait for the responses together, that gets real messy if you have to spawn threads/tasks every time.

valenterry · 2024-03-25T16:49:06

I'm not very familiar with Erlang, but from my understanding, Erlang actually does have this very distinction - you either run local code or you interact with other actors. And here the big distinction gets quite clear: once you shoot a message out, you don't know what will happen afterwards. Both you or the other actor might crash and/or send other messages etc.

So Erlang does not try to hide it, instead, it asks the developer to embrace it and it's one of its strength.

That being said, I think that actors are a great way to model a system from the birds-perspective, but it's not so great to handle concurrency within a single actor. I wish Erlang would improve here.

toast0 · 2024-03-25T17:14:05

Actors are a building block of concurrency. IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency. But that's very out of scope of Erlang, BEAM code does compile (JIT) to native code on amd64 and arm64, but the JIT is optimized for speed, since it happens at code load time, it's not an profiling/optimizing JIT like Java's hotspot. There's no register scheduler like you'd need to achieve concurrency, all the beam ops end up using the same registers (more or less), although your processor may be able to do magic with register renaming and out of order operations in general.

If you want instruction level concurrency, you should probably be looking into writing your compute heavy code sections as Native Implemented Functions (NIFs). Let Erlang wrangle your data across the wire, and then manipulate it as you need in C or Rust or assembly.

valenterry · 2024-03-25T17:22:40

> IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency

I think it makes sense to have that, including managing the communication with other actors. Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".

Actors are very powerful and a great tool to have at your disposal, but often they are too powerful for the job and then it can be better to fall back to a more "low level" or "local" type of concurrency management.

At least that's how I feel. In my opinion you need both, and while you can get the job done with just one of them (or even none), it's far from being optimal.

Also, what you mention about NIFs is good for a very specific usecase (high performance / parallelism) but concurrency has a broader scope.

toast0 · 2024-03-25T17:33:48

> Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".

I assume you don't want to wait with a x minute timeout (and meantime not do anything). You can manage this in three ways really:

a) you could spawn an actor to send the message and wait for a response and then take the fallback action.

b) you could keep a list (or other structure, whatever) of outstanding messages and timeouts, and prune the list if you get a response, or otherwise periodically check if there's a timeout to process.

c) set a timer and do the thing when you get the timer expiration message, or cancel the timer if you get a response. (which is conceptually done by sending a message to the timer server actor, which will send you a timer handle immediately and a timer expired message later; there is a timer server you can use through the timer module, but erlang:send_after/[2,3] or erlang:start_timer/[3,4] are more efficient, because the runtime provides a lot of native timer functionality as needed for timeouts and what not anyway)

Setting up something to 'automatically' do something later means asking the question of how is the Actor's state managed concurrently, and the thing that makes Actors simple is by being able to answer that the Actor always does exactly one thing at a time, and that the Actor cannot be interrupted, although it can be killed in an orderly fashion at any time, at least in theory. Sometimes the requirement for an orderly death means it may mean an operation in progress must finish before the process can be killed.

valenterry · 2024-03-25T17:42:46

Exactly. Now imagine a) is unessarily powerful. I don't want to manage my own list as in b), but other than that, b) sounds fine and c) is also fine, though, does it need an actor in the background? No.

In other words, having a well built concept for these cases is important. At least that's my take. You might say "I'll just use actors and be fine", but for me it's not sufficient.

valenterry · 2024-03-25T14:19:46

Oh and just to add onto it, I think async/await is not really the best solution to tackle these semantic difference. I prefere the green-thread-IO approach, which feels a might more heavy but it leads to a true understanding how to combine and control logic in a concurrent/parallel setting. Async/await is great to add it to languages that already have something like promises and want to improve syntax in an easy way though, so it has its place - but I think it was not the best choice for Rust.

fragmede · 2024-03-25T08:35:34

There's also writing your code with poll() and select(), which is its own thing.

tuetuopay · 2024-03-25T18:08:27

well that's the great thing with async rust: you write with poll and select without writing poll and select. let the computer and the compiler get this detail out of my way (seriously I don't want to do the fd interest list myself).

and I can still write conceptually similar select code using the, well, select! macro provided by most async runtimes to do the same on a select list of futures. better separation, easier to read, and overall it boils down to the same thing.

cryptonector · 2024-03-25T19:00:36

> OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words.

Green threads ("cheap threads") are still expensive if you end up spreading a lot of per-client state on the stack. That's because with async/await and CPS you end up compressing the per-client state into a per-client data structure, and you end up having very few function call activation frames on the stack, all of which unwind before blocking in the executor.

Kinrany · 2024-03-25T14:43:43

> In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls?

IIRC withoutboats said in one of the posts that the true answer is compatibility with C.

marcosdumay · 2024-03-25T15:56:26

> if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls?

You mean like Haskell?

The answer is that you need an incredibly good compiler to make this behave adequately, and even then, every once in a while you'll get the wrong behavior and need to rewrite your code in a weird way.

littlestymaar · 2024-03-25T08:32:32

> But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and in stead of "await" just use normal function calls? Then you will suddenly be programming in threads.

Some programmers do, but many want exactly the opposite as well. Most of the time I don't care if it's an OS blocking syscall or a non-blocking one, but I do care about understanding the control flow of the program I'm reading and see where there's waiting time and how to make them run concurrently.

In fact, I'd kill to have a blocking/block keyword pair whenever I'm working with blocking functions, because they can surreptitiously slow down everything without you paying attention (I can't count how many pieces of software I've seen with blocking syscalls in the UI thread, leading to frustratingly slow apps!).

mike_hearn · 2024-03-25T10:10:15

This is a really common comment to see on HN threads about async/await vs fibers/virtual threads.

What you're asking for is performance to be represented statically in the type system. "Blocking" is not a useful concept for this. As avodonosov is pointing out, nothing stops a syscall being incredibly fast and for a regular function that doesn't talk to the kernel at all being incredibly slow. The former won't matter for UI responsiveness, the latter will.

This isn't a theoretical concern. Historically a slow class of functions involved reading/writing to the file system, but in some cases now you'll find that doing so is basically free and you'll struggle to keep the storage device saturated without a lot of work on multi-threading. Fast NVMe SSDs like found in enterprise storage products or MacBooks are a good example of this.

There are no languages that reify performance in the type system, partly because it would mean that optimizing a function might break the callers, which doesn't make sense, and partly because the performance of a function can vary wildly depending on the parameters it's given.

Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

The right way to handle this is the Java approach (by pron, who is posting in this thread). You give the developer threads and make it cheap to have lots of them. Now break down tasks into these cheap threads and let the runtime/OS figure out if it's profitable to release the thread stack or not. They're the best placed to do it because it's a totally dynamic decision that can vary on a case-by-case basis.

Nullabillity · 2024-03-25T12:49:27

> It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

You'll typically have an idea of whether or not a function performs IO from the start. Changing that after the fact violates the users' conceptual model and expectation of it, even if all existing code happens to keep working.

jerf · 2024-03-25T13:27:44

If you want to go full Haskell on the problem for purity-related reasons, by all means be my guest. I strongly approve.

However, unless you're in such a language, warping my entire architecture around that objection does not provide a good cost-benefit tradeoff. I've got a lot of fish to fry and a lot of them are bigger than this in practice. Heck, there's still plenty of programmers who will consider it as unambiguous feature that they can add IO to anything they want or need to and consider it a huge negative when they can't, and a lot of programmers who don't practice an IO isolation and don't even conceive of "this function is guaranteed to not do any IO/be impure" as a property a function can have.

mike_hearn · 2024-03-26T09:21:50

In any system that uses mmap or swap it's a meaningless distinction anyway (which is obviously nearly all of them outside of embedded RTOS). Accessing even something like the stack can trigger implicit/automatic IO of arbitrary complexity, so the concept of a function that doesn't do IO is meaningless to begin with. Async/await isn't justified by any kind of interesting type theory, it exists to work around limitations in runtimes and language designs.

spinningslate · 2024-03-25T13:36:46

> You'll typically have an idea of whether or not a function performs IO from the start.

I think GP's point is: why does that matter? Much writing on Async/Await roughly correlates IO with "slow". GP rightly points out that "slow" is imprecise, changes, means different things to different people and/or use cases.

I completely get the intuition: "there's lag in the [UI|server|...], what's slowing it down?". But the reality is that trying to formalise "slow" in the type system is nigh on impossible - because "slow" for one use case is perfectly acceptable for another.

Nullabillity · 2024-03-25T14:20:57

Even if you ignore performance completely, IO is unreliable. IO is unpredictable. IO should be scrutinized.

littlestymaar · 2024-03-25T13:51:51

While slow in absolute depends on lots of factors, the relative slowness of things doesn't so much. Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network. No matter how hardware progresses, latency hierarchy is doomed to stay.

That doesn't mean it's the only factor of slowness, and that async/await solves all issues, but it's a tool that helps, a lot, to fight against very common sources of performance bugs (like how the borrow checker is useful when it protects against the nastiest class of memory vulnerabilities, even if it cannot solve all security issues).

Because the situation where “my program is stupidly waiting for some IO even though I don't even need the result right now and I could do something in the meantime” is something that happens a lot.

cesarb · 2024-03-25T14:42:12

> Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network.

The network is special: the time it takes to fetch something over the network can be arbitrarily large, or even infinite (this can also apply to disk when running over networked filesystems), while for registers/RAM/disk (as long as it's a local disk which is not failing) the time it takes is bounded. That's the reason why async/await is so popular when dealing with the network.

Nullabillity · 2024-03-25T17:56:22

PCIe is a network. USB is a network. There is no such thing as a resource with a guaranteed response time.

mike_hearn · 2024-03-25T13:13:31

There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO. Consider any library that introduces some sort of config file or registry keys, or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Nullabillity · 2024-03-25T13:21:39

> There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO.

There are languages that don't enforce the expectation on a type level, but that doesn't mean that people don't have expectations.

> Consider any library that introduces some sort of config file or registry keys

Yeah, please don't do this behind my back. Load during init, and ask for permission first (by making me call something like Config::load() if I want to respect it).

> or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Slightly more reasonable, but this still introduces a lot of considerations that the application developer needs to be aware of (how should the library find its helper binary? what if the sandboxing mechanism fails or isn't available?).

mike_hearn · 2024-03-25T17:54:11

For the sandbox example I was thinking of desktop operating systems where things like file IO can become brokered without apps being aware of it. So the API doesn't change, but the implementation introduces IPC where previously there wasn't any. In practice it works fine.

cryptonector · 2024-03-25T21:35:20

> There are no languages that reify performance in the type system,

Async/await is a way of partially doing... just that, but without a having to indicate what is "blocking", and if an async function blocks, well, you'll be unhappy, so don't do that. For a great deal of things this is plenty good enough, but from a computer science perspective it's deeply unsatisfying because one would want the type system to prevent making such mistakes.

At least with async/await an executor could start more threads when an async thread makes a known-blocking call, thus putting a band-aid on the problem.

Perhaps the compiler could reason about the complexity of code (e.g., recursion and nested loops w/ large or unclear bounds -> accidentally quadratic -> slow -> "blocking") and decide if a pure function is "blocking" by dint of being slow. File I/O libraries could check if the underlying devices are local and fast vs. remote and slow, and then file I/O could always be async but completing before returning when the I/O is thought to be fast. This all feels likely to cause more problems than it solves.

If green threads turn out not to be good enough then it's easier to accept the async/await compromise and reason correctly about blocking vs. not.

littlestymaar · 2024-03-25T10:39:36

You can't encode everything about performance in the type system, but that doesn't mean you cannot do it at all: having a type system that allows you to control memory layout and allocation is what makes C++ and Rust faster than most languages. And regarding what you say about storage access: storage bandwidth is now high, but latency when accessing an SSD is still much higher than accessing RAM, and network is even worse. And it will always be the case no matter what progress hardware makes, because of the speed of light.

Saying that async/await doesn't help with all performance issues is like saying Rust doesn't prevent all bugs: the statement is technically correct, but that doesn't make it interesting.

> Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult.

Many developers have embraced the async/await model with delight, because it instead makes maintenance easier by making the intent of the code more explicit.

It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written. Async/await may be slightly more tedious to write (it's highly context dependent though, when you have concurrent tasks to execute or need cancellation, it becomes much easier with futures).

> The right way to handle this is the Java approach (by pron, who is posting in this thread)

No it's not, and Mr Pressler's has repeatedly shown that he misses the social and communication aspects, so it's not entirely surprising.

avodonosov · 2024-03-25T08:54:14

But all functions are blocking.

   fn foo() {bar(1, 2);}
   fn bar(a, b) {return a + b;}

Here bar is a blocking function.

EVa5I7bHFq9mnYK · 2024-03-25T10:27:49

Difference is in quantities. bar blocks for nanoseconds, blocking that the GP talks about affects the end user, which means it's in seconds.

littlestymaar · 2024-03-25T09:16:15

No they aren't, and that's exactly my point.

Most functions aren't doing any syscall at all, and as such they aren't either blocking or non-blocking.

Now because of path dependency and because we've been using blocking functions like regular functions, we're accustomed to think that blocking is “normal”, but that's actually a source of bugs as I mentioned before. In reality, async functions are more “normal” than regular functions: they don't do anything fancy, they just return a value when you call them, and what they return is a future/promise. In fact you don't even need to use any async anotation for a function to be async in Rust, this is an async function:

    fn toto() -> impl Future> {
        unimplemented!();
    }

The async keyword exists simply so that the compiler knows it has to desugar the await inside the function into a state machine. But since Rust has async blocks it doesn't even need async on functions at all, the information you need comes from the type of the return value, that is a future.

Blocking functions, on the contrary, are utterly bizarre. In fact, you cannot make one yourself, you must either call another blocking function[1] or do a system call on your own using inline assembly. Blocking functions are the anomaly, but many people miss that because they've lived with them long enough to accept them as normal.

[1] because blockingness is contagious, unlike asynchronousness which must be propagated manually, yes ironically people criticizing async/await get this one backward too

anonymoushn · 2024-03-25T10:09:43

"makes certain syscalls" is a highly unconventional definition of "blocking" that excludes functions that spin wait until they can pop a message from a queue.

If your upcoming systems language uses a capabilities system to prevent the user from inadvertently doing things that may block for a long time like calling open(2) or accessing any memory that is not statically proven to not cause a page fault, I look forward to using it. I hope that these capabilities are designed so that the resulting code is more composable than Rust code. For example it would be nice to be able to use the Reader trait with implementations that source their bytes in various different ways, just as you cannot in Rust.

littlestymaar · 2024-03-25T11:37:50

Blocking syscalls are a well defined and well scoped class of problems, sure there are other situations where the flow stops and a keyword can't save you from everything.

Your reasoning is exactly similar to the folks who say “Rust doesn't solve all bugs” because it “just” solve the memory safety ones.

anonymoushn · 2024-03-25T12:05:48

I may be more serious than you think. Having worked on applications in which blocking for multiple seconds on a "non-blocking syscall" or page fault is not okay, I think it would really be nice to be able to statically ensure that doesn't happen.

littlestymaar · 2024-03-25T12:30:59

I'm not disputing that, in the general case I suspect this is going to be undecidable, and that you'd need careful design to carve out a subset of the problem that is statically addressable (akin to what rust did for memory safety, by restricting the expressiveness of the safe subset of the languages).

For blocking syscalls alone there's not that much PL research to do though and we could get the improvement practically for free, that's why I consider them to be different problems (also because I suspect they are much more prevalent given how much I've encountered them, but it could be a bias on my side).

cozzyd · 2024-03-25T14:33:58

Any function can block if memory it accesses is swapped out.

immibis · 2024-03-25T09:59:17

bar blocks waiting for the CPU to add the numbers.

littlestymaar · 2024-03-25T10:18:11

Nope it doesn't, in the final binary the bar function doesn't even exist anymore, as the optimizer inlined it, and CPUs have been using pipelining and speculative execution for decades now, they don't block on single instruction. That's the problem with abstractions designed in the 70s, they don't map well with the actual hardware we have 50 years after…

gpderetta · 2024-03-25T14:28:39

Sure, unless it is the first time you are executing that line of code and you have to wait for the OS to slowly fault it in across a networked filesystem.

diarrhea · 2024-03-25T11:17:44

Make `a + b` `A * B` then, multiplication of two potentially huge matrices. Same argument still holds, but now it's blocking (still just performing addition, only an enormous number of times).

littlestymaar · 2024-03-25T11:32:04

It's not blocking, it's doing actual work.

Blocking is the way used by the old programming paradigm to deal with asynchronous actions, and it works by behaving the same way as when the computer actually computes thing, so that's where the confusion comes from. but the two situations are conceptually very different: in one case, we are idle (but don't see it), in another case we're busy doing actual work. Maybe in case 2. we could optimize the algorithm so that we spend more time, but that's not sure, whereas in case 1. there's something obvious to do to speed things up: do something at the same time instead of waiting mindlessly. Having a function marked async gives you a pointer that you can actually run it concurrently to something else and expect speed up, whereas with blocking syscall there's no indication in the code that those two functions you're calling next to each other with not data dependency between them would gain a lot to be run concurrently by spawning two threads.

BTW, if you want something that's more akin to blocking, but at a lower level, it's when the CPU has to load data from RAM: it's really blocked doing nothing useful. Unfortunately that's not something you can make explicit in high-level languages (or at least, the design space hasn't been explored) so when these kinds of behavior matters to you, that's when you dive to assembly.

Izkata · 2024-03-25T14:27:30

A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc". All other functions are blocking by default, including that simple addition "bar" function above.

littlestymaar · 2024-03-25T14:48:11

Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning (unlike Rust future which are lazy, that is: do no work before they are awaited on).

As I said before, most of what you call a “blocking function” is actually a “no opinion function” but since in the idiosyncrasy of most programming languages blocking functions are being called like “no opinion” ones, you are mixing them up. But it's not a fundamental rule. You could imagine a language where blocking functions (which contains an underlying blocking syscall) are being called with the block keyword and where regular functions are just called like functions. There's no relation between regular functions and blocking functions except path dependency that led to this particular idiosyncrasy we live in, it is entirely contingent.

Izkata · 2024-03-25T15:22:26

> Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning

Yes, that's syntactic sugar for returning a promise. This pattern is something we've long called a non-blocking function in Javascript. The first part that's not in the promise is for setting it up.

littlestymaar · 2024-03-25T20:03:00

If you define a non-blocking function to be what you decide is non-blocking, that's a bit of cheating don't you think? ;)

How about this function:

    async fn toto( input: u8) -> bool {
      if input % 2 == 0 {
        true
      } else{
        false
      }
    }

Is it a non-blocking one or not according to your criteria?

Izkata · 2024-03-25T22:10:22

We were just taking Javascript and now you're back to Rust? That same function with that same keyword acts differently in the two languages.

My best guess is you're defining it (mostly?) by the syntax while I'm defining it by how the function acts. By what I'm talking about, that's a non-blocking function in Rust, but written the exact same way in Javascript it's a blocking function.

littlestymaar · 2024-03-26T07:04:04

> We were just taking Javascript and now you're back to Rust? That same function with that same keyword acts differently in the two languages.

Yes, that's the point, and they in both case they are being called non-blocking functions, despite behaving differently.

> My best guess is you're defining it (mostly?) by the syntax while I'm defining it by how the function acts

I'm defining them how they are commonly being referred to, whereas you're using some arbitrary criteria that is not even consistent, as you'll see below.

> By what I'm talking about, that's a non-blocking function in Rust, but written the exact same way in JavaScript it's a blocking function.

Gotcha!

If we get back to your definition above

> A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc".

Then it should be a “blocking function” because what this function returns is basically a private enum with two variants and a poll method to unwrap them. There's nothing running in the background ever.

In fact, in Rust an async function never runs anything in the background: all it does is returning a Future which is a passive state machine, there's no magic in there, and it's not very different from a closure or an Option actually (in fact, in this example, it's practically an the same as an Option). Then you can send this state machine to an executor that will do the actual work, polling it to completion. But this process doesn't necessarily happens in the background (you can even block_on the future to make that execution synchronous).

So in reality there are two kinds of functions, the ones that returns immediately (async and regular functions) and the ones that block the execution and will return later on (the “blocking” functions). And among the two kinds of functions that do not block, there's also two flavors: the ones that returns results that are immediately ready, and the ones that returns results that won't be ready before some time.

imtringued · 2024-03-25T11:36:57

I don't know what to tell you, but that is how sequential code works. Sure you can find some instruction level parallelism in the code and your optimizer may be able to do it across function boundaries, but that is mostly a happy accident. Meanwhile HDLs are the exact opposite. Parallel by default and you have to build sequential execution yourself. What is needed for both HLS and parallel programming is a parallel by default hybrid language that makes it easy to write both sequential and parallel code.

littlestymaar · 2024-03-25T13:25:13

Except, unless you're using atomics or volatiles, you have no guaranties that the code you're writing sequentially is going to be executed this way…

imtringued · 2024-03-25T11:04:52

>In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

Why not go one step further and invent "Parallel Rust"? And by parallel I mean it. Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe. Of course one problem with this strategy is that we don't exactly have processors that are designed to spawn and process micro-threads. You would need to go back all the way to Sun's SPARC architecture for that and then extend it with the concept of a tree based stack so that multiple threads can share the same stack.

zozbot234 · 2024-03-25T11:37:46

> Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe.

The rayon crate lets you do something quite similar.

jerf · 2024-03-25T13:22:50

I believe the answer is "that implies a runtime", and Rust as a whole is not willing to pull that up into a language requirement.

This is in contrast to Haskell, Go, dynamic scripting languages, and, frankly, nearly every other language on the market. Almost everything has a runtime nowadays, and while each individually may be fine they don't always play well together. It is important that as C rides into the sunset (optimistic and aspirational, sure, but I hope and believe also true) and C++ becomes an ever more complex choice to make for various reasons that we have a high-power very systems-oriented programming language that will make that choice, because someone needs to.

avodonosov · 2024-03-25T11:12:10

That would be a good step forward, I support it :)

BTW, do we need the `parallel` keyword, or better to simply let all code be parallel by default?

josephg · 2024-03-25T11:35:49

Haskell has entered the chat…

However, almost all of the most popular programming languages are imperative. I assume most programmers prefer to think of our programs as a series of steps which execute in sequence.

Mind you, arguably excel is the most popular programming language in use today, and it has exactly this execution model.

gpderetta · 2024-03-25T12:33:39

You do not need to spawn threads/tasks eagerly. You can do it lazily on work-stealing. See cilk++.

galangalalgol · 2024-03-25T12:39:47

Doesn't rayon have a syntax like that?

adontz · 2024-03-25T05:51:52

There are a lot of moments not covered. For example:

- async/await runs in context of one thread, so there is no need for locks or synchronization. Unless one runs async/await in multiple threads to actually utilize CPU cores, then locks and synchronization are necessary again. This complexity may be hidden in some external code. For example instead of synchronizing access to a single database connection it is much easier to open one database connection per async task. However such approach may affect performance, especially with sqlite and postgres.

- error propagation in async/await is not obvious. Especially when one tries to group up async tasks. Happy eyeballs are a classic example.

- since network I/O was mentioned, backpressure should also be mentioned. CPython implementation of async/await notoriously lacks network backpressure causing some problems.

graphenus · 2024-03-25T06:14:26

Async/await just like threads is a concurrency mechanism and also always requires locks when accessing the shared memory. Where does your statement come from?

conradludgate · 2024-03-25T06:48:23

If you perform single threaded async in Rust, you can drop down to the cheap single threaded RefCell rather than the expensive multithreaded Mutex/RwLock

cogman10 · 2024-03-25T11:20:12

That's one example of a lock you might eliminate, but there are plenty of other cases where it's impossible to eliminate locks even while single threaded.

Consider, for example, something like this (not real rust, I'm rusty there)

    lock {
      a = foo();
      b = io(a).await;
      c = bar(b);
    }

Eliminating this lock is unsafe because a, b, and c are expected to be updated in tandem. If you remove the lock, then by the time you reach c, a and b may have changed under your feet in an unexpected way because of that await.

josephg · 2024-03-25T11:41:07

Yeah but this problem goes away entirely if you just don’t await within a critical region like that.

I’ve been using nodejs for a decade or so now. Nodejs can also suffer from exactly this problem. In all that time, I think I’ve only reached for a JS locking primitive once.

cogman10 · 2024-03-25T15:12:10

There is no problem here with the critical region. The problem would be removing the critical region because "there's just one thread".

This is incorrect code

      a = foo();
      b = io(a).await;
      c = bar(b);

Without the lock, `a` can mutate before `b` is done executing which can mess with whether or not `c` is correct. The problem is if you have 2 independent variables that need to be updated in tandem.

Where this might show up. Imagine you have 2 elements on the screen, a span which indicates the contents and a div with the contents.

If your code looks like this

    mySpan.innerText = "Loading ${foo}";
    myDiv.innerText = load(foo).await;
    mySpan.innerText = "";

You now have incorrect code if 2 concurrent loads happen. It could be the original foo, it could be a second foo. There's no way to correctly determine what the content of `myDiv` is from an end user perspective as it depends entirely on what finished last and when. You don't even know if loading is still happening.

josephg · 2024-03-25T20:29:12

I absolutely agree that that code looks buggy. Of course it is - if you just blindly mix view and model logic like that, you’re going to have a bad day. How many different states can the system be in? If multiple concurrent loads can be in progress at the same time, the answer is lots.

But personally I wouldn’t solve it with a lock. I’d solve it by making the state machine more explicit and giving it a little bit of distance from the view logic. If you don’t want multiple loads to happen at once, add an is_loading variable or something to track the loading state. When in the loading state, ignore subsequent load operations.

cogman10 · 2024-03-25T21:20:57

> add an is_loading variable or something to track the loading state.

Which is definitionally a mutex AKA a lock. However, it's not a lock you are blocking on but rather one that you are trying and leaving.

I know it doesn't look like a traditional lock, but in a language like javascript or python it's a valid locking mechanism. For javascript that's because of the single thread execution model a boolean variable is guaranteed to be consistently set for multiple concurrent actions.

That is to say, you are thinking about concurrency issues, you just aren't thinking about them in concurrency terms.

Here's the Java equivalent to that concept

https://docs.oracle.com/javase/8/docs/api/java/util/concurre...

josephg · 2024-03-25T23:45:29

Yeah I agree. The one time I wrote a lock in javascript it worked like you were hinting at. You could await() the lock's release, and if multiple bits of code were all waiting for the lock, they would acquire it in turn.

But again, I really think in UI code it makes a lot more sense to be clear about what the state is, model it explicitly and make the view a "pure" expression of that state. In the code above:

- The state is 0 or more promises loading data.

- The state is implicit. Ie, the code doesn't list the set of loading promises which are being awaited at any point in time. Its not obvious that there is a collection going on.

- The state is probably wrong. The developer probably wants either 0 or 1 loading states. (Or maybe a queue of them). Because the state hasn't been modelled explicitly, it probably hasn't been considered enough

- The view is updated incorrectly based on the state. If 2 loads happen at the same time, then 1 finishes, the UI removes the "loading..." indicator from the UI. Correct view logic should ensure that the UI is deterministic based on the internal state. 1 in-progress load should result in the UI saying "loading...".

Its a great example. With code like this I think you should always carefully and explicitly consider all of the states of your system, and how the state should change based on user action. Then all UI code can flow naturally from that.

A lock might be a good tool. But without thinking about how you want the program to behave, we have no way to tell. And once you know how you want your program to behave, I find locks to be usually unnecessary.

nadagast · 2024-03-25T21:38:36

I think a lot of this type of problem goes away with immutable data and being more careful with side effects (for example, firing them all at once at the end rather than dispersed through the calculation)

romanovcode · 2024-03-25T06:20:28

> Where does your statement come from?

This is how async/await works in Node (which is single-threaded) so most developers think this is how it works in every technology.

bheadmaster · 2024-03-25T06:42:28

Even in Node, if you perform asynchronous operations on a shared resource, you need synchronization mechanisms to prevent interleaving of async functions.

There has been more than one occasion when I "fixed" a system in NodeJS just by wrapping some complex async function up in a mutex.

nurple · 2024-03-25T07:32:56

This lacks quite a bit of nuance. In node you are guaranteed that synchronous code between two awaits will run to completion before another task(that could access your state) from the event loop gets a turn; with multi-threaded concurrency you could be preempted between any two machine instructions. So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

What you usually see with JS for concurrency of shared IO resources in practice is that they are "owned" by the closure of a flow of async execution and rarely available to other flows. This architecture often obviates the need to lock on the shared resource at all as the natural serialization orchestrated by the string of state machines already naturally accomplishes this. This pattern was even quite common in the CPS style before async/await.

For example, one of the first things an app needs do before talking to a DB is to get a connection which is often retrieved by pulling from a pool; acquiring the reservation requires no lock, and by virtue of the connection being exclusively closed over in the async query code, it also needs no locking. When the query is done, the connection can be replaced to the pool sans locking.

The place where I found synchronization most useful was in acquiring resources that are unavailable. Interestingly, an async flow waiting on a signal for a shared resource resembles a channel in golang in how it shifts the state and execution to the other flow when a pooled resource is available.

All this to say, yeah I'm one of the huge fans of node that finds rust's take on default concurrency painfully over complicated. I really wish there was an event-loop async/await that was able to eschew most of the sync, send, lifetime insanity. While I am very comfortable with locks-required multithreaded concurrency as well, I honestly find little use for it and would much prefer to scale by process than thread to preserve the simplicity of single-threaded IO-bound concurrency.

mike_hearn · 2024-03-25T10:14:07

> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

No, this can still be required. Nothing stops a developer setting up a partially completed data structure and then suspending in the middle, allowing arbitrary re-entrancy that will then see the half-finished change exposed in the heap.

This sort of bug is especially nasty exactly because developers often think it can't happen and don't plan ahead for it. Then one day someone comes along and decides they need to do an async call in the middle of code that was previously entirely synchronous, adds it and suddenly you've lost data integrity guarantees without realizing it. Race conditions appear and devs don't understand it because they've been taught that it can't happen if you don't have threads!

bheadmaster · 2024-03-25T09:05:36

> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory

Yes, in Node you don't get the usual data races like in C++, but data-structure races can be just as dangerous. E.g. modifying the same array/object from two interleaved async functions was a common source of bugs in the systems I've referred to.

Of course, you can always rely on your code being synchronous and thus not needing a lock, but if you're doing anything asynchronous and you want a guarantee that your data will not be mutated from another async function, you need a lock, just like in ordinary threads.

One thing I deeply dislike about Node is how it convinces programmers that async/await is special, different from threading, and doesn't need any synchronisation mechanisms because of some Node-specific implementation details. This is fundamentally wrong and teaches wrong practices when it comes to concurrency.

nurple · 2024-03-25T15:58:07

But single-threaded async/await _is_ special and different from multi-threaded concurrency. Placing it in the same basket and prescribing the same method of use is fundamentally wrong and fails to teach the magic of idiomatic lock free async javascript.

I'm honestly having a difficult time creating a steel man js sample that exhibits data races unless I write weird C-like constructs and ignore closures and async flows to pass and mutate multi-element variables by reference deep into the call stack. This just isn't how js is written.

When you think about async/await in terms of shepherding data flows it becomes pretty easy to do lock free async/await with guaranteed serialization sans locks.

bheadmaster · 2024-03-25T16:23:00

> I'm honestly having a difficult time creating a steel man js sample that exhibits data races

I can give you a real-life example I've encountered:

    const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds

    let cache = {}; // Shared cache object

    function getFromCache(key) {
      const cachedData = cache[key];
      if (cachedData && Date.now() - cachedData.timestamp  setTimeout(resolve, 100));
      mockFetchCount += 1;
      return `result from ${url}`;
    }

    async function fetchDataAndUpdateCache(key) {
      const cachedData = getFromCache(key);
      if (cachedData) {
        return cachedData;
      }

      // Simulate fetching data from an external source
      const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

      updateCache(key, newData);
      return newData;
    }

    // Race condition:
    (async () => {
      const key = 'myData';

      // Fetch data twice in a sequence - OK
      await fetchDataAndUpdateCache(key);
      await fetchDataAndUpdateCache(key);
      console.log('mockFetchCount should be 1:', mockFetchCount);

      // Reset counter and wait cache expiry
      mockFetchCount = 0;
      await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));

      // Fetch data twice concurrently - we executed fetch twice!
      await Promise.all([fetchDataAndUpdateCache(key), fetchDataAndUpdateCache(key)]);
      console.log('mockFetchCount should be 1:', mockFetchCount);
    })();

This is what happens when you convince programmers that concurrency is not a problem in JavaScript. Even though this cache works for sequential fetching and will pass trivial testing, as soon as you have concurrent fetching, the program will execute multiple fetches in parallel. If server implements some rate-limiting, or is simply not capable of handling too many parallel connections, you're going to have a really bad time.

Now, out of curiosity, how would you implement this kind of cache in idiomatic, lock-free javascript?

wolfgang42 · 2024-03-25T17:44:13

> how would you implement this kind of cache in idiomatic, lock-free javascript?

The simplest way is to cache the Promise instead of waiting until you have the data:

    -async function fetchDataAndUpdateCache(key: string) {
    +function fetchDataAndUpdateCache(key: string) {
       const cachedData = getFromCache(key);
       if (cachedData) {
         return cachedData;
       }

       // Simulate fetching data from an external source
     -const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
     +const newData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

       updateCache(key, newData);
       return newData;
     }

From this the correct behavior flows naturally; the API of fetchDataAndUpdateCache() is exactly the same (it still returns a Promise), but it’s not itself async so you can tell at a glance that its internal operation is atomic. (This does mildly change the behavior in that the expiry is now from the start of the request instead of the end; if this is critical to you you can put some code in `updateCache()` like `data.then(() => cache[key].timestamp = Date.now()).catch(() => delete cache[key])` or whatever the exact behavior you want is.)

I‘m not even sure what it would mean to “add a lock” to this code; I guess you could add another map of promises that you’ll resolve when the data is fetched and await on those before updating the cache, but unless you’re really exposing the guts of the cache to your callers that’d achieve exactly the same effect but with a lot more code.

bheadmaster · 2024-03-25T21:13:37

Ok, that's pretty neat. Using Promises themselves in the cache instead of values to share the source of data itself.

While that approach has a limitation that you cannot read the data from inside the fetchDataAndUpdateCache (e.g. to perform caching by some property of the data), that goes beyond the scope of my example.

> I‘m not even sure what it would mean to “add a lock” to this code

It means the same as in any other language, just with a different implementation:

    class Mutex {
        locked = false
        next = []

        async lock() {
            if (this.locked) {
                await new Promise(resolve => this.next.push(resolve));
            } else {
                this.locked = true;
            }
        }

        unlock() {
            if (this.next.length > 0) {
                this.next.shift()();
            } else {
                this.locked = false;
            }
        }
    }

I'd have a separate map of keys-to-locks that I'd use to lock the whole fetchDataAndUpdateCache function on each particular key.

nurple · 2024-03-25T17:28:13

Don't forget to fung futures that are fungible for the same key.

ETA: I appreciate the time you took to make the example, also I changed the extension to `mjs` so the async IIFE isn't needed.

  const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds
  
  let cache = {}; // Shared cache object
  let futurecache = {}; // Shared cache of future values
  
  function getFromCache(key) {
    const cachedData = cache[key];
    if (cachedData && Date.now() - cachedData.timestamp  setTimeout(resolve, 100));
    mockFetchCount += 1;
    return `result from ${url}`;
  }
  
  async function fetchDataAndUpdateCache(key) {
    // maybe its value is cached already
    const cachedData = getFromCache(key);
    if (cachedData) {
      return cachedData;
    }
  
    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }
  
    // Simulate fetching data from an external source
    const futureData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
    futurecache[key] = futureData;
  
    const newData = await futureData;
    delete futurecache[key];
  
    updateCache(key, newData);
    return newData;
  }
  
  const key = 'myData';
  
  // Fetch data twice in a sequence - OK
  await fetchDataAndUpdateCache(key);
  await fetchDataAndUpdateCache(key);
  console.log('mockFetchCount should be 1:', mockFetchCount);
  
  // Reset counter and wait cache expiry
  mockFetchCount = 0;
  await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));
  
  // Fetch data twice concurrently - we executed fetch twice!
  await Promise.all([...Array(100)].map(() => fetchDataAndUpdateCache(key)));
  console.log('mockFetchCount should be 1:', mockFetchCount);

bheadmaster · 2024-03-25T21:38:07

I see, this piece of code seems to be crucial:

    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }

It indeed fixes the problem in a JS lock-free way.

Note that, as wolfgang42 has shown in a sibling comment, the original cache map isn't necessary if you're using a future map, since the futures already contain the result:

    async function fetchDataAndUpdateCache(key) {
        // maybe its value is cached already
        const cachedData = getFromCache(key);
        if (cachedData) {
          return cachedData;
        }

        // Simulate fetching data from an external source
        const newDataFuture = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

        updateCache(key, newDataFuture);
        return newDataFuture;
    }

---

But note that this kind of problem is much easier to fix than to actually diagnose.

My hypothesis is that the lax attutide of Node programmers towards concurrency is what causes subtle bugs like these to happen in the first place.

Python, for example, also has single-threaded async concurrency like Node, but unlike Node it also has all the standard synchronization primitives also implemented in asyncio: https://docs.python.org/3/library/asyncio-sync.html

nurple · 2024-03-26T00:52:34

Wolfgang's optimization is very nice, I also found interesting his signal of a non-async function that returns a promise as an "atomic". I don't particularly like typed JS, so it would be less visible to me.

Absolutely agree on the observability of such things. One area I think shows some promise, though the tooling lags a bit, is in async context[0] flow analysis.

One area I have actually used it so far is in tracking down code that is starving the event loop with too much sync work, but I think some visualization/diagnostics around this data would be awesome.

If we view Promises/Futures as just ends of a string of a continued computation, whos resumption is gated by some piece of information, the points between where you can weave these ends together is where the async context tracking happens and lets you follow a whole "thread" of state machines that make up the flow.

Thinking of it this way, I think, also makes it more obvious how data between these flows is partitioned in a way that it can be manipulated without locking.

As for the node dev's lax attitude, I would probably be more agressive and say it's an overall lack of formal knowledge on how computing and data flow works. As an SE in DevOps a lot of my job is to make software work for people that don't know how computers, let alone platforms, work.

[0]: https://nodejs.org/api/async_context.html

dehrmann · 2024-03-25T06:58:40

async can be scarier for locks since a block of code might depend on having exclusive access, and since there wasn't an await, it got it. Once you add an await in the middle, the code breaks. Threading at least makes you codify what actually needs exclusive access.

async also signs you up for managing your own thread scheduling. If you have a lot of IO and short CPU-bound code, this can be OK. If you have (or occasionally have) CPU-bound code, you'll find yourself playing scheduler.

cageface · 2024-03-25T07:13:14

Yeah once your app gets to be sufficiently complex you will find yourself needing mutexes after all. Async/await makes the easy parts of concurrency easy but the hard parts are still hard.

bsder · 2024-03-25T06:42:51

I have lots of issues with async/await, but this is my primary beef with async/await:

Remember the Gang of Four book "Design Patterns"? It was basically a cookbook on how to work around the deficiencies of (mostly) C++. Yet everybody applied those patterns inside languages that didn't have those deficiencies.

Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

However, async/await was necessary in order to shove Rust down the throats of the Javascript programmers who didn't know anything else. Quoting without.boats:

https://without.boats/blog/why-async-rust/

> I drove at async/await with the diligent fervor of the assumption that Rust’s survival depended on this feature.

Whether async/await was even a good fit for Rust technically was of no consequence. Javascript programmers were used to async/await so Rust was going to have async/await so Rust could be jammed down the throats of the Javascript network services programmers--technical consequences be damned.

tsimionescu · 2024-03-25T07:53:30

Async/await was invented for C#, another multithreaded language. It was not designed to work around a lack of true parallelism. It is instead designed to make it easier to interact with async IO without having to resort to manually managed thread pools. It basically codifies at the language level a very common pattern for writing concurrent code.

It is true though that async/await has a significant advantage compared to fibers that is related to single threaded code: it makes it very easy to add good concurrency support on a single thread, especially in languages which support both. In C#, it was particularly useful for executing concurrent operations from the single GUI thread of WPF or WinForms, or from parts of the app which interact with COM. This used the SingleThreadedExecutor, which schedules tasks on the current thread, so it's safe to run GUI updates or COM interactions from a Task, while also using any other async/await code, since tasks inherit their executor.

kamray23 · 2024-03-25T10:13:25

Yeah, Microsoft looked at callback hell, realized that they had seen this one before, dipped into the design docs for F# and lifted out the syntactic sugar of monads. And it worked fine. But really, async/await is literally callbacks. The keyword await just wraps the rest of the function in a lambda and stuffs it in a callback. It's fully just syntactic sugar. It's a great way of simplifying how callback hell is written, but it's still callback hell in the end. Where having everything run in callbacks makes sense, it makes sense. Where it doesn't it doesn't. At some point you will start using threads, because your use case calls for threads instead of callbacks.

WorldMaker · 2024-03-25T14:38:11

Most compilers don't just wrap the rest of the function into a lambda but build a finite state machine with each await point being a state transition. It's a little bit more than just "syntactic sugar" for "callbacks". In most compilers it is most directly like the "generator" approach to building iterators (*function/yield is ancient async/await).

I think the iterator pattern in general is a really useful reference to keep in mind. Of course async/await doesn't replace threads just like iterators don't replace lists/arrays. There are some algorithms you can more efficiently write as iterators rather than sequences of lists/arrays. There are some algorithms you can more efficiently write as direct list/array manipulation and avoid the overhead of starting iterator finite state machines. Iterator methods are generally deeply composable and direct list/array manipulation requires a lot more coordination to compose. All of those things work together to build the whole data pipeline you need for your app. So too, async/await makes it really easy to write some algorithms in a complex concurrent environment. That async/await runs in threads and runs with threads. It doesn't eliminate all thinking about threads. async/await is generally deeply composable and direct thread manipulation needs more work to coordinate. In large systems you probably still need to think about both how you are composing your async/await "pipelines" and also how your threads are coordinated. The benefits of composition such as race/await-all/schedulers/and more are generally worth the extra complexity and overhead (mental and computation space/time), which is why the pattern has become so common so quickly. Just like you can win big with nicely composed stacks of iterator functions. (Or RegEx or Observables or any of the other many cases where designing complex state machines both complicates how the system works and eventually simplifies developer experience with added composability.)

yxhuvud · 2024-03-25T07:39:13

I generally don't agree with the direction withoutboats went with asynchricity but you are reading in a whole lot more into that sentence than is really there. It is very clear (based on his writing, in this and other articles) that he went with the solution because he thinks it is the right one, on a technical level.

I don't agree, but making it sound like it was about marketing the language to JavaScript people is just wrong.

geodel · 2024-03-25T14:52:40

> was about marketing the language to JavaScript people is just wrong.

No it seems very right to me. Rust despite being "Systems language" was not satisfied with market size of systems programing and they really needed all those millions of JS programmers to make language a big success.

seabrookmx · 2024-03-25T06:54:23

Threads have a cost. Context switching between them at the kernel level has a cost. There are some workloads that gain performance by multiplexing requests on a thread. Java virtual threads, golang goroutines, and dotnet async/await (which is multi threaded like Rust+tokio) all moved this way for _performance_ reasons not for ergonomic or political ones.

It's also worth pointing out that async/await was not originally a JavaScript thing. It's in many languages now but was first introduced in C#. So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

lelanthran · 2024-03-25T07:47:34

> So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

You're missing his point. His point is that the most popular language, which has the most number of programmers forced the hand of Rust devs.

His point is not that the first language had this feature, it's that the most programmers used this feature, and that was due to the most popular programming language having this feature.

tsimionescu · 2024-03-25T08:01:39

That Rust needed async/await to be palatable to JS devs would only be a problem if we think async/await is not needed in Rust, because it is only useful to work around limitations of JS (single-threaded execution, in this case). If instead async/await is a good feature in its own right (even if not critical), then JS forcing Rust's hand would be at best an annoyance.

And the idea that async/await was only added to JS to work around its limitations is simply wrong. So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

lelanthran · 2024-03-25T08:09:47

> So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

I don't really understand the counter argument here.

My reading of the argument[1] is that "Popularity amongst developers forced Rust devs hands in adding async". If this is the argument, then a counter argument of "It never (or only) made sense in the popular language (either)" is a non-sequitor.

IOW, if it wasn't added due to technical reasons (which is the original argument, IIRC), then explaining technical reasons for/against isn't a counter argument.

[1] i.e. Maybe I am reading it wrong?

bsder · 2024-03-25T08:13:38

You are not reading it wrong, and your statements are accurate.

My broader point is that the possibility of there being a "technically better" construct was simply not in scope for Rust. In order for Rust to capture Javascript programmers, async/await was the only construct that could possibly be considered.

And, to be fair, it worked. Rust's growth has been almost completely on the back of network services programming.

bsder · 2024-03-25T08:08:06

> all moved this way for _performance_ reasons

They did NOT.

Async performance is quite often (I would even go so far as to say "generally") worse than single threaded performance in both latency AND throughput under most loads that programmers ever see.

Most of the complications of async are much like C#:

1) Async allows a more ergonomic way to deal with a prima donna GUI that must be the main thread and that you must not block. This has nothing to do with "performance"--it is a limitation of the GUI toolkit/Javascript VM/etc..

2) Async adds unavoidable latency overhead and everybody hits this issue.

3) Async nominally allows throughput scaling. Most programmers never gain enough throughput to offset the lost latency performance.

seabrookmx · 2024-03-25T18:08:25

1) it offers a more ergonomic way for concurrency in general. `await Task.WhenAll(tasks);` is (in my opinion) more ergonomic than spinning up a thread pool in any language that supports both.

2) yes, there is a small performance overhead for continuations. Everything is a tradeoff. Nobody is advocating for using async/await for HFT, or in low level languages like C or Zig. We're talking nanoseconds here.. for a typical web API request that's in the 10's of ms that's a drop in the ocean.

3) I wouldn't say it's nominal! I'd argue most non-trivial web workloads would benefit from this increase in throughput. Pre-fork webservers like gunicorn can consume considerably more resources to serve the same traffic than an async stack such as uvicorn+FastAPI (to use Python as an example).

> Most of the complications of async are much like C#

Not sure where you're going with this analogy but as someone who's written back-end web services in basically every language (other than lisp, no hate though), C#/dotnet core is a pretty great stack. If you haven't tried it in a while you should give it a shot.

josephg · 2024-03-25T12:00:03

Eh. Async and to a lesser extent green threads are the only solutions to slowloris HTTP attacks. I suppose your other option is to use a thread pool in your server - but then you need to but hide your web server behind nginx to keep it safe. (And nginx is safe because it internally uses async IO).

Async is also usually wildly faster for networked services than blocking IO + thread pools. Look at some of the winners of the techempower benchmarks. All of the top results use some form of non blocking IO. (Though a few honourable mentions use go - with presumably a green thread per request):

https://www.techempower.com/benchmarks/

I’ve also never seen Python or Ruby get anywhere near the performance of nodejs (or C#) as a web server. A lot of the difference is probably how well tuned v8 and .net are, but I’m sure the async-everywhere nature of javascript makes a huge difference.

klooney · 2024-03-25T14:34:19

Async's perfect use case is proxies though- get a request, go through a small decision tree, dispatch the I/O to the kernel. You don't want proxies doing complex logic or computation, the stuff that creates bottlenecks in the cooperative multithreading.

seabrookmx · 2024-03-25T16:15:45

Most API's (rest, graphql or otherwise) are effectively a proxy. Like you say, if you don't have complex logic and you're effectively mapping an HTTP request to a query, then your API code is just juggling incoming and outgoing responses and this evented/cooperative approach is very effective.

anonymoushn · 2024-03-25T10:24:08

Where does the unavoidable latency overhead come from?

Do you have some benchmarks available?

neonsunset · 2024-03-25T10:58:28

The comment you are responding to is not wrong about higher async overhead, but it is wrong at everything else either out of lack of experience with the language or out of being confused about what it is that Task and ValueTask solve.

All asynchronous methods (as in, the ones that have async keyword prefixed to them) are turned into state machines, where to live across await, the method's variables that persist across it need to be lifted to a state machine struct, which is then often (but not always) needs to be boxed aka heap allocated. All this makes the cost of what would have otherwise been just a couple of method calls way more significant - single await like this can cost 50ns vs 2ns spent on calling methods.

There is also a matter of heap allocations for state machine boxes - C# is generally good when it comes to avoiding them for (value)tasks that complete synchronously and for hot async paths that complete asynchronously through pooling them, but badly written code can incur unwanted overhead by spamming async methods with await points where it could have been just forwarding a task instead. Years of bad practices arisen from low skill enterprise dev fields do not help this either, with only the switch to OSS and more recent culture shift aided by better out of box analyzers somewhat turning the tide.

This, however, does not stop C#'s task system from being extremely useful for achieving lowest ceremony concurrency across all programming languages (yes, it is less effort than whatever Go or Elixir zealots would have you believe) where you can interleave, compose and aggregate task-returning methods to trivially parallelize/fork/join parts of existing logic leading to massive code productivity improvement. Want to fire off request and do something else? Call .GetStringAsync but don't await it and go back to it later with await when you do need the result - the request will be likely done by then. Instant parallelism.

With that said, Rust's approach to futures and async is a bit different, where-as C#'s each async method is its own task, in Rust the entire call graph is a single task with many nested futures where the size of the sum of all stack frames is known statically hence you can't perform recursive calls within async there - you can only create a new (usually heap-allocated) which gives you what effectively looks a linked list of task nodes as there is no infinite recursion in calculating their sizes. This generally has lower overhead and works extremely well even in no-std no-alloc scenarios where cooperative multi-tasking is realized through a single bare metal executor, which is a massive user experience upgrade in embedded land. .NET OTOH is working on its own project to massively reduce async overhead but once the finished experiment sees integration in dotnet/runtime itself, you can expect more posts on this orange site about it.

mellinoe · 2024-03-25T13:06:29

> .NET OTOH is working on its own project to massively reduce async overhead

Where can I read more about that?

neonsunset · 2024-03-25T13:19:47

Initial experiment issue: https://github.com/dotnet/runtime/issues/94620

Experiment results write-up: https://github.com/dotnet/runtimelab/blob/e69dda51c7d796b812...

TLDR: The green threads experiment was a failure as it found (expected and obvious) issues that the Java applications are now getting to enjoy, joining their Go colleagues, while also requiring breaking changes and offering few advantages over existing model. It, however, gave inspiration to subsequent re-examination of current async/await implementation and whether it can be improved by moving state machine generation and execution away from IL completely to runtime. It was a massive success as evidenced by preliminary overhead estimations in the results.

vips7L · 2024-03-26T03:05:20

The tl;dr that I got when I read these a few months ago was that C# relies on too much FFI which makes implementing green threads hard and on top of that would require a huge effort to rewrite a lot of stuff to fit the green thread model. Java and Go don’t have these challenges since Go shipped with a huge standard library and Java’s ecosystem is all written in Java since it never had good ffi until recently.

neonsunset · 2024-03-26T03:21:23

Surely you're not claiming that .NET's standard library is not extensive and not written in C#.

If you do, consider giving .NET a try and reading the linked content if you're interested - it might sway your opinion towards more positive outlook :)

vips7L · 2024-03-26T10:36:03

> Surely you're not claiming that .NET's standard library is not extensive and not written in C#.

I’m claiming that MSFT seems to care really about P/Invoke and FFI performance and it was one of the leading reasons for them not to choose green threads. So there has to be something in .NET or C# or win forms or whatever that is influencing the decision.

I’m also claiming that this isn’t a concern for Java. 99.9% of the time you don’t go over FFI and it’s what lead the OpenJdk team to choose virtual threads.

> If you do, consider giving .NET a try

I’d love to, but dealing with async/await is a pain :)

neonsunset · 2024-03-26T10:56:17

You’ve never used it, so how can you know?

OtomotO · 2024-03-25T06:55:14

I would damn this, if Async/Await wasn't a good enough (TM) solution for certain problems where Threads are NOT good enough.

Remember: there is a reason why Async/Await was created B E F O R E JavaScript was used for more than sprinkling a few fancy effects on some otherwise static webpages

pkolaczk · 2024-03-25T13:08:48

> Rust can run multiple threads just fine

Rust is also used in environments which don't support threads. Embedded, bare metal, etc.

tuetuopay · 2024-03-25T18:12:19

Strong disagree.

> Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

it allows you to have semantic concurrency where there are no threads available. like, you known, on microncontrollers without an (RT)OS where such a systems programming language is a godsend.

seriously, using async/await on embedded makes so much sense.

Karrot_Kream · 2024-03-25T07:56:50

async/await is just a different concurrency paradigm with different strengths and weaknesses than threads. Rust has support for threaded concurrency as well though the ecosystem for it is a lot less mature.

guappa · 2024-03-25T09:59:36

Threads are much much slower than async/await.

specialist · 2024-03-25T14:43:01

> backpressure should also be mentioned

I ran into this when I joined a team using nodejs. Misc services would just ABEND. Coming from Java, I was surprised by this oversight. It was tough explaining my fix to the team. (They had other great skills, which I didn't have.)

> error propagation in async/await is not obvious

I'll never use async/await by choice. Solo project, ...maybe. But working with others, using libraries, trying to get everyone on the same page? No way.

--

I haven't used (language level) structured concurrency in anger yet, but I'm placing my bets on Java's Loom Project. Best as I can tell, it'll moot the debate.

junon · 2024-03-25T08:58:59

> async/await runs in context of one thread,

Not in Rust.

pohl · 2024-03-25T11:22:23

There is a single thread executor crate you can use for that case if it’s what you desire, FWIW.

junon · 2024-03-25T11:28:10

Yes of course, but the async/await semantics are not designed only to be single threaded. Typically promises can be resumed on any executor thread, and the language is designed to reflect that.

iknowstuff · 2024-03-26T08:51:12

This is completely wrong. You gotta learn about Send and Sync in Rust before you speak.

Rust makes no assumptions and is explicitly designed to support both single and multi threaded executors. You can have non-Send Futures.

junon · 2024-03-26T10:37:14

I'm fully aware of this, thanks @iknowstuff.

>>>>> Typically promises are designed...

I'm merely saying Rust async is not restricted to single threaded like many other languages design their async to be, because most people coming from Node are going to assume async is always single threaded.

Most people who write their promise implementations make them Send so they work with Tokio or Async-Std.

Relax, my guy. The shitty tone isn't necessary.

EDIT: Ah, your entire history is you just arguing with people. Got it.

exfalso · 2024-03-25T07:30:11

It's interesting to see an almost marketing-like campaign to save face for async/await. It is very clear from my experience that it was not only a technical mistake, it also cost the community dearly. Instead of focusing on language features that are actually useful, the Rust effort has been sidetracked by this mess. I'm still very hopeful for the language though, and it is the best thing we've got at the moment. I'm just worried that this whole fight will drag on forever. P.S the AsyncWrite/AsyncRead example looks reasonable, but in fact you can do the same thing with threads/fds as long as you restrict yourself to *nix.

junon · 2024-03-25T08:58:08

I've used async in firmware before. It was a lifesaver. The generalizations you make are unfounded and are clearly biased toward a certain workload.

AstralStorm · 2024-03-25T11:19:19

I'd love the detail on this, what did it save you from and how did you ensure your firmware does not, say, hang?

junon · 2024-03-25T11:26:12

Having to implement my own scheduler for otherwise synchronous network, OLED, serial and USB drivers on the same device, as well as getting automatic power state management when the executor ran out of arrived promises.

And a watchdog timer, like always. There's no amount of careful code that absolves you from using a watchdog timer.

For anyone curious, Embassy is the runtime/framework I used. Really well built.

thadt · 2024-03-25T12:40:41

That sounds kind of amazing. Working low level without an OS sounds like exactly the kind of place that Rust's concurrency primitives and tight checking would really be handy. Doing it in straight up C is complicated, and becomes increasingly so with every asynchronous device you have to deal with. Add another developer or two into the mix, and it can turn into a buggy mess rather quickly.

Unless you pull in an embedded OS, one usually ends up with a poor man's scheduler being run out of the main loop. Being able to do that with the Rust compiler looking over your shoulder sounds like it could be a rather massive benefit.

cozzyd · 2024-03-25T14:54:19

The way to do it in C isn't all that different, is it? You just have explicit state machines for each thing. Yes you have to call thing_process() in the main loop at regular intervals (and probably have each return an am_busy state to determine if you should sleep or not). It's more code but it's easy enough to reason about and probably easier to inspect in a debugger.

thadt · 2024-03-25T15:48:11

Yep, the underlying mechanics have to do the same thing - just swept under another a different rug. I imagine the (potential) advantage as being similar to when we had to do the same thing with JavaScript before promises came along. You would make async calls that would use callbacks for re-entry, and then you would need to pull context out from someplace and run your state machine.

Being able to write chains of asynchronous logic linearly is rather nice, especially if it's complicated. The tradeoff is that your main loop and re-entry code is now sitting behind some async scheduler, and - as you mention - will be more opaque and potentially harder to debug.

junon · 2024-03-26T10:40:06

You're making w lot of assumptions. Debugging this thing was easy. Try it before you knock it.

temporarely · 2024-03-25T11:58:46

thanks. looked that up. for the curious: https://embassy.dev/

jerf · 2024-03-25T13:35:10

I personally agree that it is great that Rust as a language is able to function in an embedded environment. Someone needs to grasp that nettle. I started by writing "concede" in my first sentence there but it's not a concession. It's great, I celebrate it, and Rust is a fairly good choice for at least programmers working in that space. (Whether it's good for electrical engineers is another question, but that's a debate for another day.)

However, the entire Rust library ecosystem shouldn't bend itself around what is ultimately still a niche use case. Embedded uses are still a small fraction of Rust programs and that is unlikely to change anytime soon. I am confident the vast bulk of Rust programs and programmers, even including some people vigorously defending async/await, would find they are actually happier and more productive with threads, and that they would be completely incapable of finding any real, perceptible performance difference. Such exceptions as there may be, which I not only "don't deny" exist but insist exist, are welcome to pay the price of async/await and I celebrate that they have that choice.

But as it stands now, async/await may be the biggest current premature optimization in common use. Though "I am writing a web site that will experience upwards of several dozen hits per minute, should I use the web framework that benches at 1,000,000 reqs/sec or 2,000,000 reqs/sec?" is stiff competition.

asa400 · 2024-03-25T17:11:33

> But as it stands now, async/await may be the biggest current premature optimization in common use.

To be fair, isn't the entire point of OP's essay that async/await is useful specifically for reasons that aren't performance? Rather, it is that async/await is arguably more expressive, composable, and correct than threads for certain workloads.

And I have to say I agree with OP here, given what I've experienced in codebases at work: doing what we currently do with async instead with threads would result in not-insubstantial pain.

（评论） (comments)

（评论）
(comments)