This is a question I have asked myself many times throughout my career. From my first year as a programmer all the way through to the current day and my thoughts on the matter have changed with my experience. This post is an attempt to help me structure my own thoughts on the matter.
Next to lists and maps/dictionaries, queues are one of the most widely used data structures out there. Most often they are used to share data between processes or threads. It was while setting up a queue between two threads that I had the idea of writing up this post. Ordinarily I would use my experience and intuition to set an appropriate size. But on this occasion I thought I would test my own understanding of the subject by jotting down my thoughts in a structured way to see if it still sound reasoning.
What does a queue do
As a junior developer, my understanding was that you used queues when you needed to hand off work to a slow thread. For example, when handing off requests to a database thread. If the database thread executes blocking calls, a queue is required so that one thread can handle the requests whilst the database thread is busy executing queries. If the queue overflows, you need a bigger queue. My current thinking is that queues don’t increase average throughput. Instead, they act as buffers that absorb short-term bursts and timing differences between senders and receivers, allowing messages to be passed reliably even when components operate at different or variable rates.
Let me unpack that last statement.
Queues do not increase throughput
Let’s imagine a system that can handle process 1 message arriving every 2 seconds. This system is expected to run for a prolonged period of time. If messages arrive at exactly 1 message every 2 seconds, the system only needs a queue size of 1. If messages arrive at a greater rate than 1 message every 2 seconds, the system will eventually be overwhelmed and increasing the queue size will not fix this.
Queues allow systems us to handle bursts of traffic
In this system with a queue size of 1, whilst it can handle 1 message every 2 seconds, it can’t necessarily handle 30 messages every 60 seconds. This is because we don’t know what the distribution of messages over that 60 seconds looks like. If the system receives 30 messages in the 1st second and then nothing for the following 59 seconds, it will overflow the queue.
Does this mean the queue size should be set to handle the largest burst of traffic we should expect?
For batch systems this can makes sense. Let’s imaging our system was receiving rows in a CSV file. We can think of each row as a message. The maximum queue size we would need would be the number of rows in the CSV file. Though given reading rows from the CSV file takes some time and the system is processing messages all the time, we could optimistically have a smaller queue size than the number of rows in the CSV file.
For real time systems, the answer is a bit more complicated.
Latency limits
All real time systems have some kind of acceptable latency limit. However, for many systems this latency limit is so high that we don’t really think about it. Using a larger queue results in items further and further back in the queue taking longer and longer to process.
If we define what our latency limit is, we can size the queue appropriately. This not only reduces wasted space on storing the messages, but also provides us with a means of reacting to messages that may overflow the latency limit. For example, if a system is unable to write to a queue, it could drop those messages, or it could provide a back pressure signal. A back pressure signal indicates to the caller that the queue is full and passes responsibility for remediating to the caller. For example, if the thing adding messages to the queue is a CSV file reader, it may choose to hold the current message in memory and sleep for 10 milliseconds and then try again to add to the queue.
As for how to derive the queue size from a latency limit, we can use Little’s Law. This states that:
“average number of items in a stable system” = “average arrival rate” x “average time an item spends in the system”
Or roughly speaking in our case: “size of queue” = “average arrival rate” * “latency limit”
15 = 0.5 messages per second * 30 seconds
If an item arrives at the 15th place in our queue and the processing time is 1 message every two seconds, that message will take 30 seconds to process.
Degenerative latency cases
Note that Little’s law deals with average numbers. In the real world, these numbers are not so clear cut. Garbage collection cycles, CPU saturation, IO limits and network congestion are just some of the myriad of factors that can throw off your well configured queue size in production. Therefore it’s useful to have robust monitoring on important queues in your system.
One way to improve monitoring is to track queue depth and alerting if it exceeds some limit. This can be useful in identifying when there are bottle necks.
In addition, another helpful strategy is timestamp-ing messages when they arrive in the queue and when they have finished being processed by the system. This will allow you figure out the service response time. This is invaluable in systems with more rigid real time guarantees. Where the service response time breaches the desired limit, the message can be flagged and investigated.