我的队列应该使用多大的尺寸？

我的队列应该使用多大的尺寸？
What size should I use for my queue

原始链接: https://www.syntacticbrownsugar.tech/posts/what-size-queue/

## 理解软件系统中队列大小在软件开发职业生涯中，经常会遇到关于最佳队列大小的问题。最初，队列被视为较慢进程（如数据库交互）的缓冲区，需要较大的尺寸来防止溢出。然而，人们逐渐认识到：队列并不能*提高*吞吐量，而是吸收系统组件之间的短期突发和时序变化，确保可靠的消息传递。简单来说，如果一个系统每两秒处理一条消息，队列大小为一就足以应对稳定的流量。溢出发生在消息到达速率超过处理能力时，增加队列大小并不能解决这个根本问题。对于批量处理，队列大小可以与总输入大小（例如，CSV文件中的行数）相关联。实时系统需要考虑延迟限制。使用小队列定律（队列大小 = 到达速率 x 延迟限制），可以计算出合适的尺寸。然而，现实世界中的因素，如垃圾回收和网络拥塞，可能会扰乱这种计算，因此需要对队列深度和消息处理时间进行可靠的监控，以识别瓶颈并确保性能保持在可接受的范围内。

## 队列大小考量一篇 Hacker News 的讨论探讨了最佳队列大小的问题，超越了简单缓冲突发流量的想法。虽然队列确实可以吸收数据流的临时高峰，但其目的往往更为微妙。确定队列大小的关键因素包括生产者和消费者的相对处理速度——较慢的系统决定了整体吞吐量。队列大小也很大程度上取决于应用程序。对于金融交易或日志等关键数据，优先考虑持久性意味着需要一个大的队列（可能在满时阻塞），以防止数据丢失。相反，对于高频率、快速变化的数据（如 GPS 坐标），可以使用较小的队列，并可能丢弃较旧的数据，从而在延迟和分辨率之间取得平衡。队列还可以通过批处理提高消费者效率，甚至充当同步机制来强制消息顺序，而无需考虑延迟问题。最终，队列大小应根据特定系统需求量身定制，考虑数据关键性、处理速率和网络优化等因素。

原文

This is a question I have asked myself many times throughout my career. From my first year as a programmer all the way through to the current day and my thoughts on the matter have changed with my experience. This post is an attempt to help me structure my own thoughts on the matter.

Next to lists and maps/dictionaries, queues are one of the most widely used data structures out there. Most often they are used to share data between processes or threads. It was while setting up a queue between two threads that I had the idea of writing up this post. Ordinarily I would use my experience and intuition to set an appropriate size. But on this occasion I thought I would test my own understanding of the subject by jotting down my thoughts in a structured way to see if it still sound reasoning.

What does a queue do

As a junior developer, my understanding was that you used queues when you needed to hand off work to a slow thread. For example, when handing off requests to a database thread. If the database thread executes blocking calls, a queue is required so that one thread can handle the requests whilst the database thread is busy executing queries. If the queue overflows, you need a bigger queue. My current thinking is that queues don’t increase average throughput. Instead, they act as buffers that absorb short-term bursts and timing differences between senders and receivers, allowing messages to be passed reliably even when components operate at different or variable rates.

Let me unpack that last statement.

Queues do not increase throughput

Let’s imagine a system that can handle process 1 message arriving every 2 seconds. This system is expected to run for a prolonged period of time. If messages arrive at exactly 1 message every 2 seconds, the system only needs a queue size of 1. If messages arrive at a greater rate than 1 message every 2 seconds, the system will eventually be overwhelmed and increasing the queue size will not fix this.

Queues allow systems us to handle bursts of traffic

In this system with a queue size of 1, whilst it can handle 1 message every 2 seconds, it can’t necessarily handle 30 messages every 60 seconds. This is because we don’t know what the distribution of messages over that 60 seconds looks like. If the system receives 30 messages in the 1st second and then nothing for the following 59 seconds, it will overflow the queue.

Does this mean the queue size should be set to handle the largest burst of traffic we should expect?

For batch systems this can makes sense. Let’s imaging our system was receiving rows in a CSV file. We can think of each row as a message. The maximum queue size we would need would be the number of rows in the CSV file. Though given reading rows from the CSV file takes some time and the system is processing messages all the time, we could optimistically have a smaller queue size than the number of rows in the CSV file.

For real time systems, the answer is a bit more complicated.

Latency limits

All real time systems have some kind of acceptable latency limit. However, for many systems this latency limit is so high that we don’t really think about it. Using a larger queue results in items further and further back in the queue taking longer and longer to process.

If we define what our latency limit is, we can size the queue appropriately. This not only reduces wasted space on storing the messages, but also provides us with a means of reacting to messages that may overflow the latency limit. For example, if a system is unable to write to a queue, it could drop those messages, or it could provide a back pressure signal. A back pressure signal indicates to the caller that the queue is full and passes responsibility for remediating to the caller. For example, if the thing adding messages to the queue is a CSV file reader, it may choose to hold the current message in memory and sleep for 10 milliseconds and then try again to add to the queue.

As for how to derive the queue size from a latency limit, we can use Little’s Law. This states that:

“average number of items in a stable system” = “average arrival rate” x “average time an item spends in the system”

Or roughly speaking in our case: “size of queue” = “average arrival rate” * “latency limit”

15 = 0.5 messages per second * 30 seconds

If an item arrives at the 15th place in our queue and the processing time is 1 message every two seconds, that message will take 30 seconds to process.

Degenerative latency cases

Note that Little’s law deals with average numbers. In the real world, these numbers are not so clear cut. Garbage collection cycles, CPU saturation, IO limits and network congestion are just some of the myriad of factors that can throw off your well configured queue size in production. Therefore it’s useful to have robust monitoring on important queues in your system.

One way to improve monitoring is to track queue depth and alerting if it exceeds some limit. This can be useful in identifying when there are bottle necks.

In addition, another helpful strategy is timestamp-ing messages when they arrive in the queue and when they have finished being processed by the system. This will allow you figure out the service response time. This is invaluable in systems with more rigid real time guarantees. Where the service response time breaches the desired limit, the message can be flagged and investigated.