Filip Hráček / text /
The following article describes a Flutter use case that is so niche that I wouldn’t be surprised if I’m literally the only person who has it at this point.
Still, I personally love to read deeply technical articles even when their usefulness for me is unclear at best — so I decided to write this anyway.
Use case
My game is mostly 2D, but it includes a retro 3D renderer. From the start of working on the game, I wanted a specific look which I’m going to call “a combination of 1970s sci-fi aesthetic and modern military UI”.
For this aesthetic, a “normal”, modern 3D renderer just wouldn’t work. So I decided to create a software (non-GPU) 3d renderer mostly from scratch. This allowed me to have complete control over every aspect of the final look of the 3d objects, and since we live in the 21st century, our contemporary computers are more than capable of running the renderer at high framerates.
Single-threaded beginnings
As a rule, I try to build every prototype feature single threaded first before adding the complexity of concurrency.
Since we’re talking about 1970s graphics running on 2020s computers, I was able to keep the renderer single-threaded, on the main thread, for something like 2 years.
Then again, a 3d renderer is a 3d renderer, so I didn't completely ignore performance during that time.
One cool thing about Flutter is that you get access to some low level drawing APIs, including Canvas.drawVertices. This method allows you to send a list of triangles (with various colors and/or textures) basically straight to the GPU. Perfect for my use case.
Pretty soon after implementing the initial 3d renderer, I addressed the fact that I’m creating a new list of triangles every frame. That means a lot of memory allocation and garbage collection. So I went with the slightly more low-level method of creating lists in Dart: TypedData.
Dart classes like Float32List (which is a subclass of TypedData) allow you to allocate a continuous block of memory. In this case, a continuous array of 32-bit floating point numbers. In contrast to how you would do it normally (with List<double>), you and the compiler have an understanding that this is really a continuous block of numbers, it can’t be split, it can’t be appended (without copying), and it contains only plain values (no boxed values). Sometimes, this is exactly what you want — it’s fast, simple, and already in a format that other parts of the computer (e.g., the GPU) understand.
So far, so good. You can hear me talk about drawVertices and typed data buffers in this talk:
So, now we have a software 3d renderer written in Dart (a high level GC'd language) that blits triangles to the screen while allocating almost no memory from frame to frame. For a modern PC, the performance is more than acceptable.
Road to multi-threading
But then I realize I would sometimes like to show many independent 3D models at once. And I’m also adding more and more game logic, including marching square computations for multiple heightmaps on every frame. And I never wanted the game to be exclusive to modern machines - I want the game to be playable on potatoes.
I do some profiling and, on Widows (which I expect to be my main audience), the renderers take something like 20-30% of CPU time. Even with the optimizations above. That’s a lot for something that’s an aesthetic choice and not crucial to the game’s simulation. And remember, this is all still happening on the main thread (Canvas.drawVertices can’t be called from anywhere else).
So, it’s time to move the 3D renderer to a separate thread.
Shared memory in Dart
Dart’s concurrency model is based on isolates. The following is a gross simplification, but Dart isolates are threads that don’t share any mutable memory (i.e., they’re isolated). The features is inspired by Erlang’s processes and by the web platform’s WebWorkers.
The isolation is a fantastic feature because shared mutable memory is a major source of really-hard-to-debug bugs. If you don’t believe me, the next time you’re with a C++ or Java programmer, casually mention the word thread or mutex and see their eyes twitch.
With isolate-based concurrency, you don’t need to deal with deadlocks and concurrent modification bugs because your “threads” (isolates) can only communicate by sending each other messages. These messages are always deeply copied, so you can’t accidentally trample on another isolate’s collection, for example. You get your own copy.
This works great for many use cases.
- You want to compute something? Just send the input of the computation to an isolate: it will take it, run the computation, and send the result back when it’s done.
- Want to extract some information from a giant JSON file on disk? Just send the path to an isolate: it will read the bytes, decode them to a
String, parse it, find what you need, and send the information back packaged as a nice Dart object. - Want to download an image, make a non-trivial transformation to its bytes, and receive the result? No problem. Just send the URL to an isolate: it will download the bytes, do the transformation, and send you the result. An astute reader might wonder: so the transformed data are copied from isolate to isolate? By default, yes. But you can use something called
TransferableTypedDatato basically transfer the ownership of the resulting bytes — no copying involved.
But then there’s my use case. I could prepare the typed data buffers inside an isolate, but then by sending them to the main isolate, I either have to copy them, or I have to transfer their ownership. So I’d have to start anew for every new frame, allocating a new block of memory. That’s something I want to avoid.
FFI to the rescue
Thankfully, while Dart itself tries to keep you away from the foot-gun that is shared memory, it also has to talk to other systems that don’t have such luxury. Notably, the C foreign function interface (FFI) — a standard way by which different programming languages communicate with each other using the lowest common denominator of C function calling.
Without leaving Dart, you can allocate memory on the native heap and ask Dart to free it once you don’t need it anymore:
import 'package:ffi/ffi.dart';
const n = 40;
final pointer = malloc.allocate<Int64>(n * Int64List.bytesPerElement);
final array = pointer.asTypedList(n, finalizer: malloc.nativeFree);
I learned this API from mraleph (Slava Egorov) on the Flutter Forum. Forums, btw, are treasure troves of information and deep technical discussions. (Full disclosure: I’m a mod on the aforementioned forum and it was kind of my idea to start it. But I happily give all credit to the person who actually started it, Hillel Coren of It’s All Widgets fame, and to all the other people who are active there.) Seriously, if you want to keep your sanity, I can highly recommend limiting your time on social media and instead joining a focused forum.
Anyway, now we have a way to share mutable arrays across isolate boundaries:
- The
arrayobject above is of typeInt64List. Except it’s backed by memory outside the Dart heap. - As long as
arrayis reachable, the memory will be allocated. Afterwards, Dart callsmalloc.nativeFreeto free the block of memory. (This means that you probably want to assign the array to a field of some long-living object — otherwise the memory will get freed as soon asarrayis no longer in scope.) - You can send the
pointerto another isolate. The pointer is just a memory address, so copying it while sending is basically free. - The other isolate can also call
pointer.asTypedList(n)(without the finalizer) to get its own buffer, backed by the same block of memory. - This also means you should send the
nto the other isolate, so it knows how large the buffer is.
Putting shared mutable memory to work
So, now that we have a way to punch a hole (as Slava puts it) into Dart’s concurrency isolation model, how do we actually make it work?
Here’s how:
- When the widget that renders 3D (I call it
Retro3D) is added to the Flutter widget tree, it first loads the bytes of the 3D file from assets. (AssetBundleis not available outside the main isolate, AFAIK.) - Then it constructs an
IsolateRenderer(a completely custom class) and asks it toinitialize()with the file bytes and some initial, immutable information about the scene. - The
IsolateRendererspawns the worker isolate and sends the data over with the initial message. It then starts waiting for messages from the worker isolate. - Inside the worker isolate, the bytes of the 3D file are parsed into a 3D scene.
- The worker isolate counts the number of vertices and faces in the scene. This informs the size of arrays that will be needed for the draw calls. (For example, for
nvertices, you need aFloat32Listofn*2elements — two 2D coordinates per each vertex. Similarly, formtriangles, you need anInt32Listofm*3elements — one ARGB color for each point on the triangle.) - The worker isolate sends a
_BufferAllocationRequestmessage to the main isolate with the sizes it needs. IsolateRendererallocates the memory usingmalloc.allocate()and creates two sets of buffers, #1 and #2. It saves these buffers into a field (IsolateRenderer._renderBuffers) so that they aren’t freed prematurely. It then sends the pointers back to the worker isolate as a_UseTheseSharedBuffersmessage.- The worker isolate creates its own
TypedDataobjects using the pointers it received. - Now we have shared mutable memory. It’s important that we avoid concurrent modification (for example, main isolate sending the vertex buffer to the raster thread while it’s being worked on by the worker thread). That’s why we have two buffers, #1 and #2.
- The worker isolate sends an
_IsolateIsReadymessage to the main isolate. IsolateRendererreceives this message and completes its initialization.- From now on, the
Retro3Dwidget listens to changes to aChangeNotifiercalledSceneViewConfig, and every time there’s any change (e.g., moved camera, changed zoom), it calls a method onIsolateRenderercalledrequestNextFrame(). requestNextFrame()uses aCancellableOperationto debounce the calls, so that we don’t accidentally ask for five consecutive renders in quick succession (just because some other part of the code changed 5 different things about theSceneViewConfig, one after the other).requestNextFrame()sends aRenderConfigmessage to the worker isolate. TheRenderConfigobject includes things like the camera position, zoom, but also the current viewport size and theRenderConfig.id(which exists mostly for debugging — to link requests to later renders).- The worker isolate receives the
RenderConfigmessage. - It switches to the next buffer (if it was on #1 before, it goes to #2, and vice versa). Thanks to the fact that the rendering is all synchronous, and messages in isolates are processed in order, we know that double buffering (i.e. two buffers) is enough. One buffer is always “done” and the other one is being worked on.
- The worker isolate does the rendering and fills the shared memory buffers with its output.
- When done, the worker isolate sends back a
_RenderReadymessage. This message only contains the index of the used buffer (#1 or #2), which is a single integer. So there’s no copying of large amount of data, nor any kind of re-allocation. The message also contains some additional data, such as the current polygon count (some polygons might be hidden and therefore not present in the render mesh) and the projection matrix (so that the main thread can compute where to put labels on the render). IsolateRendererreceives the_RenderReadymessage and transforms it into aRenderResultwhich contains all the data needed for aCanvas.drawVertices()call. So, the index of the buffer received from the worker isolate is used to find the actualTypedDataobjects.- The
RenderResultis assigned as the new value of aValueNotifier. - This value notifier is used as a
repaintlistenable for theCustomPainterthat actually paints the 3D render on the screen. - Now we have a loop — any change to
SceneViewConfigleads to arequestNextFrame()call, which in turn sends aRenderConfigto the worker isolate, which renders it into one of the shared buffers, then notifies the main isolate, which repaints using the new data.
The result is noticeable, even when running on a very powerful device. On my M4 MacBook Pro, when showing three 3D renders at once, average time taken by a frame on the main thread goes from 3.7 ms to 2.9 ms. That’s 20% improvement on a thread that currently does a lot of other things (from physics simulation through marching squares all the way to AI).
Basically, the main thread’s cost of displaying all these 3D models went from 20% of its overall CPU time to close to zero.
The future of mutable shared state in Dart
The Dart team is experimenting with a more direct support for shared mutable memory. The proposal is written by the aforementioned Slava Egorov (the Dart Tech Lead), and discusses things that go beyond simple arrays. For my current use cases, that would probably be overkill, but I’m very happy that the Dart team thinks about supporting advanced (and scary and unsafe) programming. It’s the only way Dart can truly become a powerhouse general-purpose programming language. Give people sane, safe tools to work with, but also allow them to bring out a chainsaw if they must.
Was AI useful here?
It’s late 2025, and so I feel the need to address the obvious question of “was any of this code written by an A.I.”? I initially hoped so, because it seemed like a good fit. The code is already there, just put it into an isolate and come up with a message passing scheme.
But the reality was quite dismal. Even when I asked it to simply implement some standard Isolate boilerplate based off the shared memory example I've already written, it introduced a bug. After that, I got so nervous (the bug was not easy to spot) that I just implemented the whole thing myself. It took me a day, and some of it was spent chasing my own bugs (freeing memory twice, a classic) but most of it was iteratively crafting the code to my liking.
As a rule, I’m trying to force myself to use A.I.: to get out of my comfort zone, and to reap the benefits of this new technology. But increasingly, I fear that my opinion on LLMs is starting to set, and I see them not as independent coding agents I can trust, but more like fast code generators. There’s a huge gap between what current LLMs can dependably do, and the CHOP vision of “I’ll be a tech lead and the AIs will be my team”. Maybe this works in some scenarios, but I just haven’t been able to make it work.
Where current AI does help is, in my opinion:
- write easily definable functions or methods (these can be advanced, but only if the problem at hand is common enough in the training data), or tests
- “fill in the gap” kind of work (e.g. look at this one class and create the other classes that are obviously related)
- kick-start — I need to do something but I’d normally postpone doing it because I can’t decide where to start. Instead of postponing, just ask the AI for the first step and it will make the initial decisions and the boring boilerplate work for you.
- cookie-cutter programming in languages you’re not an expert in (e.g. modify my website to have automatic syntax highlighting on
<code>blocks)
This is already huge help, don’t get me wrong. It’s just that I currently don’t see an easy path from here to “AI agent can do expert level programming on my behalf”.
Future work
I’m obviously not done with the game or its optimization. Choosing Flutter & Dart to implement a simulation-heavy real-time game has its upsides (like ease of development; portability) but also its downsides (something like C++ still slays in terms of performance, especially compared to GC'd languages like Dart; Unity & Godot come with so many more bells and whistles). I chose the trade-off knowingly: it makes perfect sense for me (a Flutter expert who actually enjoys building games solo from a text editor) and for the project at hand (a game full of complex interlocking systems and experimental UI). I’m far from suggesting that it makes sense for anyone else.
So this post isn’t some kind of a “here’s how you do it” explainer. It’s more of a “look at this obscure problem I had” kind of article.
If you do find any of the above useful or at least entertaining, I’m happy.
P.S.: The multi-threaded renderer update is now up on Steam if you want to playtest it.
— Filip Hráček
November 2025