Redis 数组：漫长开发过程的简短故事

Redis 数组：漫长开发过程的简短故事
Redis array: short story of a long development process

## Redis 数组数据类型：与AI共同四个月的旅程经过四个月的开发，Redis 的新数组数据类型即将完成。该项目始于一份详细的规格文档，最初由人工编写，随后通过与 AI（特别是 GPT 5.x）的合作得到显著改进，从而促进了设计探索和妥协。 AI 促成了一项比最初计划更具雄心的实现。最终设计采用动态的多层结构，以优化内存使用和性能，尤其是在扫描和弹出元素等操作方面。广泛的 AI 辅助代码审查和重写解决了效率低下和潜在的错误。出乎意料的是，开发过程中创建了 `ARGREP`，一个强大的搜索命令，利用正则表达式（TRE 库，并进一步通过 AI 优化）。这突出了使用 AI 的一个关键优势：能够解决以前被认为过于复杂的任务。作者强调，AI 并不能取代程序员，而是为复杂的系统编程提供了一个“安全网”，处理繁琐的任务和错误检测。具有基于索引语义的数组数据类型，有望为 Redis 解锁新的用例。提交请求可供审查和反馈：[https://github.com/redis/redis/pull/15162](https://github.com/redis/redis/pull/15162)。

## Redis 与 AI 辅助开发 - 摘要 Redis 的作者 Antirez 分享了他最近使用 AI 辅助开发过程的经验，详情见他的网站 (antirez.com)。他将 AI 描述为一种强大的*合作者*，而非人类创造力的替代品，尤其是在掌握概念、算法和整体产品愿景方面。他计划发布用于指导 AI 工作的文件规范，但需要先进行一些更新。Antirez 设想软件开发将发生转变：程序员可能更专注于像 Linus Torvalds 对 Linux 内核那样的高层次想法，而 AI 则处理实现。他已经在 DeepSeek v4 推理引擎等项目中尝试这种方法，扮演更多的是概念指导的角色，而非直接编码者。一位评论员将 AI 比作理想的“橡皮鸭”调试伙伴。

原文

antirez 1 hour ago. 8047 views.

I started working on the new Array data type for Redis in the first days of January. The PR landed the repository only now, so this code was cooked for four months. I worked at the implementation kinda part time (kinda because many weeks were actually full time, sometimes to detach yourself from the keyboard is complicated), and even before LLMs the implementation was likely something I could do in four months. What changed is that in the same time span, I was able to do a lot more. This is the short story of what happened.

In the first month I just wrote the specification document. The rationale for the new data type, the C structures, the sparse representation used, the exact semantics of the array cursor for ring buffer and ARINSERT. I started writing for days a long specification by hand, then I paired with Opus initially, then GPT 5.3 was released and I switched all the design and development with Codex. Since then I use only GPT 5.x for system programming tasks. Thanks to AI, the specification evolved a lot, via back and forth of feedback, intellectual challenges about what was the best design, what was the right compromise, what was too engineered and what not.

Starting from the second month, I started the implementation using automatic programming (auto coding if you prefer), constantly reviewing the developed code. Then I realized that the level of indirection I picked was wrong. I really wanted people to be able to do ARSET myarray 293842948324 foo and everything to still work without huge allocations. The two levels of directory + slices (sparse and dense) I had were not enough. Because I had AI, I took no compromises, and I decided to go the extra mile. Once certain conditions are reached, the data structure internally changes shape, and becomes a super directory of sliced dense directories, that also point to the actual array slices (4096 elements per slice, by default). This design provided still the internal "is actually an array" representation I wanted, and the memory characteristics I seeked, while being able, for ARSCAN, and ARPOP, to scan the existing arrays taking a time proportional to the existing elements and not to the range span.

Then, it was time to read all the code, line by line. Everything was working, and this type has massive testing, thanks, again to AI, but still things that superficially work do not mean they are optimal. I found many small inefficiencies or design errors that I didn't want, so I started a process of manual and AI-assisted rewrite of many modules. When this stage was done, I started, during the third month, to stress test the implementation in many different ways. I started to be confident that it was really solid, useful, well designed.

Then… it happened. While modeling different use cases to see if the data structure was comfortable to use, I started to put markdown files into Redis arrays. Because files are a very good match for it. At this point, as I was working for other goals with agents, I realized that I could have the skills markdown files centralized knowledge base that I needed, so from a need of mine I decided to implement ARGREP. But I wanted regular expressions, too. What library to pick?

I ended up picking TRE (thanks Ville Laurikari!), because when you have regexp in Redis, you want to be sure that there are no pathological patterns in time or space. But TRE was very inefficient in a specific and extremely useful case, that is matching foo|bar|zap. So with the help of GPT I optimized it, fixed a few potential security issues, and extended the test. I had everything in place.

You know what was the biggest realization of all that? For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms. To write the initial huge specification was the key to the successive work, as it was the key to review each single line of sparsearray.c and t_array.c and modifying everything was not a good fit.

I didn't spend any word on the use cases as I tried to document the PR itself with a message where they are detailed: https://github.com/redis/redis/pull/15162 so it was not really useful to repeat myself here. Enough to say that I really believe it is about time for Redis to have a data type where the numerical index is part of the semantics.

I hope the Array PR will be accepted soon, and that we can benefit from the new use cases it opens. Of course, feedback is welcomed. Thank you.

Redis 数组：漫长开发过程的简短故事 Redis array: short story of a long development process

Redis 数组：漫长开发过程的简短故事
Redis array: short story of a long development process