半稳定C++向量容器的概念验证
A proof of concept of a semistable C++ vector container

原始链接: https://github.com/joaquintides/semistable_vector

## Semistable::vector: 动态容器的稳定迭代器 `semistable::vector` 是一个C++头文件库,提供了一个 `std::vector` 的即插即用替代方案,但具有关键的区别:**迭代器稳定性**。标准的 `std::vector` 迭代器在插入或删除时可能会失效,甚至在失效元素之前。`semistable::vector` 保证迭代器在这些操作之后仍然有效,使用 `std::shared_ptr` 通过“纪元”系统跟踪元素。 这种稳定性是通过在修改向量时创建新的纪元描述符来实现的,迭代器内部指向它们被创建时的当前纪元。这确保了即使在向量修改的情况下也能进行一致的解引用。 基准测试表明,在使用通过 `raw()` 成员函数获得的原始指针进行遍历和排序等操作时,性能与 `std::vector` 相当。虽然 C++20 引入了连续迭代器以实现潜在的性能提升,但当前标准库实现很少利用它们。 该库对于常量操作是线程安全的,但需要谨慎处理并发迭代器使用。未来的开发可以解决长期迭代器的异常安全性和内存管理,并且使用 `boost::local_shared_ptr` 的单线程版本展示了性能改进。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 一个半稳定的 C++ 向量容器的概念验证 (github.com/joaquintides) 6 分,由 joaquintides 2 小时前发布 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文
#include <semistable/vector.hpp>
#include <iostream>

int main()
{
  semistable::vector<int> x = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

  auto it = x.begin() + 5;
  std::cout << *it << "\n"; // prints 5

  x.erase(x.begin());       // erases first element

  std::cout << *it << "\n"; // prints 5 again!
}

std::vector stores its contents in a contiguous block of memory, so mid insertions and erasures invalidate references and iterators to previous elements. semistable::vector is called semistable in the sense that, while references are still unstable, its iterators correctly track elements in situations like the above.

semistable::vector stores elements contiguously and provides the same API as std::vector with the extra guarantee of iterator stability (including end()). The library is header-only and depends solely on Boost.Config. C++11 or later required.

From the point of view of stability, there are three types of operation that cause iterators to become invalid in a classical std::vector:

  • insertion of elements before a given position,
  • erasure of elements before a given position,
  • reallocation to a new buffer (e.g. with a call to reserve).

When any of these operations happens, semistable::vector creates a new epoch descriptor indicating the change. Outstanding iterators internally point to the epoch that was current when they were last used. All arrows in the diagram are std::shared_ptrs:

epoch diagram

When the iterators are used, they follow the chain of epoch descriptors till the last one, making the necessary adjustments to their stored position along the way. This ensures that dereference (as well as other iterator operations) are consistent with the current state of the vector:

epoch diagram after iterator update

When an epoch descriptor is outdated (all outstanding iterators are past it), it gets automatically deleted (no shared_ptr points to it any longer).

The graph shows normalized execution times of the following operations:

  • traversal with for_each,
  • repeated insertion at the end,
  • erase_if of odd elements,
  • sorting of elements,

for std::vector<int>, std::list<int> and semistable::vector<int> with 0.5M elements. Results are normalized to the execution time of std::vector. Benchmarks compiled with clang-cl for Visual Studio 2022 in release mode.

benchmark

Some observations:

  • semistable::vector iterators provide a raw() member function returning a plain pointer to the element (this is equivalent to calling std::to_address on the iterator). When using raw() for traversal and sorting (that is, std::for_each(x.begin().raw(), x.end.raw(), ...), std::sort(x.begin().raw(), x.end.raw()), execution times are of course the same as with std::vector. C++20 introduces the notion of contiguous iterators, which standard algorithms could in principle take advantage of by internally converting contiguous iterators to pointers for increased performance. In reality, though, no standard library implementation does that except for a handful of algorithms with a high chance of being eligible for autovectorization.
  • std::list::sort is not entirely equivalent to sorting a vector (semistable or otherwise), as the former sorts nodes whereas the latter sorts values. So, a std::list iterator will point to the exact same value after sorting, which is not the case for vectors.

Limitations and potential extensions

Like standard C++ containers, semistable::vector const and const-like member functions are thread safe. Iterator usage, however, requires extra precautions:

  • The same iterator object can't be used concurrently in different threads, even for nominally const operations such as dereferencing (internally, thread-unsafe epoch traversal is triggered).
  • An iterator can't be used concurrently with any thread-unsafe operation on the semistable::vector it belongs in, even if the operation does not touch the piece of memory the iterator points to.

These limitations could in principle be avoided by modifying the library's implementation to use atomic shared pointers.

Currently, iterator stability is not enforced if an exception is thrown other than by the allocator (typically, by a value_type construction or assignment operation). There's no internal impediment to evolving the library so as to properly cover these cases.

If an iterator it is kept in the program and never touched while its semistable::vector is being modified, the associated epoch chain will grow undefinitely because its head can't be garbage-collected as long as it points at it.

Much as with std::vector, using a semistable::vector iterator pointing to an erased element is still undefined behavior. The internal epoch machinery, however, could be easily leveraged so that those illegal uses are detected and signaled via an exception or some other mechanism.

Per a question by Mark Hoemmen, we've created a monothread version of semistable::vector that internally uses boost::local_shared_ptr instead of std::shared_ptr. Perhaps surprisingly, performance is noticeably better than with the thread-safe version.

benchmark

Thanks to Dmitry Arkhipov for his help setting the CI for this project.

联系我们 contact @ memedata.com