Show HN:Hexi,面向 C++ 黑客的现代化仅头文件网络二进制序列化库
Show HN: Hexi – Modern header-only network binary serialisation for C++

原始链接: https://github.com/EmberEmu/Hexi

Hexi是一个轻量级的、仅包含头文件的C++23库,用于安全地处理二进制数据,尤其来自网络的二进制数据。与手动操作字节或使用完整的序列化库相比,它简化了二进制数据的处理。其目标是易用性、对不受信任数据的安全性、灵活性和最小的开销。 Hexi提供`buffer_adaptor`来包装容器(例如`std::vector`,`std::array`)和`binary_stream`用于读写。`binary_stream`与`buffer_adaptor`协同工作,提供了一个访问底层数据的读写API。 Hexi优先考虑安全性,提供边界检查和可选的读取限制以防止越界访问。它默认使用基于异常的错误处理,但可以配置为无抛出操作。也支持字节序转换。 除了基本用法外,Hexi还包括写入查找、通过`put`和`get`进行细粒度的读写控制、用于零拷贝字符串/数组访问的视图创建以及自定义容器支持等功能。它还提供`file_buffer`、`static_buffer`、`dynamic_buffer`和`tls_block_allocator`用于各种用例。

Chaosvex 在 Hacker News 上发布了“Show HN: Hexi”,这是一个新的仅包含头文件的 C++ 网络二进制序列化库。Hexi旨在提供一个轻量级、高性能且安全的网络协议序列化和反序列化解决方案。作者发现现有库过于笨重、速度慢或过于抽象,因此创建了 Hexi。其主要特性包括:高效的字节操作,零分配和复制能力;针对恶意数据和编码错误的安全机制;以及易于集成到项目中。该库自包含,仅需标准库,并设计用于快速原型设计和生产使用。作者希望它能帮助其他需要快速便捷处理二进制数据的开发者。

原文

Hexi, Easy Peasy Binary Streaming

Hexi is a lightweight, header-only C++23 library for safely handling binary data from arbitrary sources (but primarily network data). It sits somewhere between manually memcpying bytes from network buffers and full-blown serialisation libraries.

The design goals are ease of use, safety when dealing with untrusted data, a reasonable level of flexibility, and keeping overhead to a minimum.

What Hexi doesn't offer: versioning, conversion between different formats, handling of text-based formats, unloading the dishwasher.

Getting started

Incorporating Hexi into your project is simple! The easiest way is to simply copy hexi.h from single_include into your own project. If you'd rather only include what you use, you can add include to your include paths or incorporate it into your own CMake project with target_link_library. To build the unit tests, run CMake with ENABLE_TESTING.

Here's what some libraries might call a very simple motivating example:

#include <hexi.h>
#include <array>
#include <vector>
#include <cstddef>

struct UserPacket {
    uint64_t user_id;
    uint64_t timestamp;
    std::array<uint8_t, 16> ipv6;
};

auto deserialise(std::span<const char> network_buffer) {
    hexi::buffer_adaptor adaptor(network_buffer); // wrap the buffer
    hexi::binary_stream stream(adaptor);          // create a binary stream
    
    // deserialise!
    UserPacket packet;
    stream >> packet;
    return packet;
}

auto serialise(const UserPacket& packet) {
    std::vector<uint8_t> buffer;
    hexi::buffer_adaptor adaptor(buffer); // wrap the buffer
    hexi::binary_stream stream(adaptor);  // create a binary stream
    
    // serialise!
    stream << packet;
    return buffer;
}

By default, Hexi will try to serialise basic structures such as our UserPacket if they meet requirements for being safe to directly copy the bytes. Now, for reasons of portability, it's not recommended that you do things this way unless you're positive that the data layout is identical on the system that wrote the data. Not to worry, this is easily solved. Plus, we didn't do any error handling. All in good time.

Remember these two classes, if nothing else!

The two classes you'll primarily deal with are buffer_adaptor and binary_stream.

binary_stream takes a container as its argument and is used to do the reading and writing. It doesn't know much about the details of the underlying container.

To support containers that weren't written to be used with Hexi, buffer_adaptor is used as a wrapper that binary_stream can interface with. As with binary_stream, it also provides read and write operations but at a lower level.

buffer_adaptor can wrap any contiguous container or view that provides data and size member functions and optionally resize() for write support. From the standard library, that means the following can be used out of the box:

Plenty of non-standard library containers will work out of the box, too, as long as they provide a vaguely similar API.

The container's value type must be a byte type (e.g. char, std::byte, uint8_t). std::as_bytes can be used as a workaround if this poses a problem.

Hexi supports custom containers, including non-contiguous containers. In fact, there's a non-contiguous container included in the library. You simply need to provide a few functions such as read and size to allow the binary_stream class to be able to use it.

static_buffer.h provides a simple example of a custom container that can be used directly with binary_stream.

As mentioned, Hexi is intended to be safe to use even when dealing with untrusted data. An example might be network messages that have been manipulated to try to trick your code into reading out of bounds.

binary_stream performs bounds checking to ensure that it will never read more data than the buffer has available and optionally allows you to specify an upper bound on the amount of data to read. This can be useful when you have multiple messages in a buffer and want to limit the deserialisation from potentially eating into the next.

buffer_t buffer;
// ... read data
hexi::binary_stream stream(buffer, 32); // will never read more than 32 bytes

Errors happen, it's up to you to handle 'em

The default error handling mechanism is exceptions. Upon encountering a problem with reading data, an exception derived from hexi::exception will be thrown. These are:

  • hexi::buffer_underrun - attempt to read out of bounds
  • hexi::stream_read_limit - attempt to read more than the imposed limit

Exceptions from binary_stream can be disabled by specifying no_throw as a template argument, as shown:

hexi::binary_stream<buf_type, hexi::no_throw> stream(...);

While this prevents binary_stream itself from throwing, it does not prevent propagation of exceptions from lower levels. For example, a wrapped std::vector could still throw std::bad_alloc if allocation fails when writing to it.

Regardless of the error handling mechanism you use, the state of a binary_stream can be checked as follows:

hexi::binary_stream<buf_type, hexi::no_throw> stream(...);
// ... assume an error happens

// simplest way to check whether any errors have occurred
if (!stream) {
    // handle error
}

// or we can get the state
if (auto state = stream.state(); state != hexi::stream_state::ok) {
    // handle error
}

Writing portable code is easy peasy

In the first example, reading our UserPacket would only work as expected if the program that wrote the data laid everything out in the same way as our own program. This might not be the case for reasons of architecture differences, compiler flags, etc.

Here's the same example but doing it portably.

#include <hexi.h>
#include <span>
#include <string>
#include <vector>
#include <cstddef>
#include <cstdint>

struct UserPacket {
    uint64_t user_id;
    std::string username;
    uint64_t timestamp;
    uint8_t has_optional_field;
    uint32_t optional_field;  // pretend this is big endian in the protocol

    // deserialise
    auto& operator>>(auto& stream) {
        stream >> user_id >> username >> timestamp >> has_optional_field;

        if (has_optional_field) {
            stream >> optional_field;
            hexi::endian::big_to_native_inplace(optional_field);
        }

        // we can manually trigger an error if something went wrong
        // stream.set_error_state();
        return stream;
    }

    // serialise
    auto& operator<<(auto& stream) const {
        stream << user_id << username << timestamp << has_optional_field;

        if (has_optional_field) {
            stream << hexi::endian::native_to_big(optional_field);
        }

        return stream;
    }
};

// pretend we're reading network data
void read() {
    std::vector<char> buffer;
    const auto bytes_read = socket.read(buffer);

    // ... logic for determing packet type, etc

    bool result {};

    switch (packet_type) {
        case packet_type::user_packet:
            result = handle_user_packet(buffer);
            break;
    }

    // ... handle result
}

auto handle_user_packet(std::span<const char> buffer) {
    hexi::buffer_adaptor adaptor(buffer);
    hexi::binary_stream stream(adaptor);

    UserPacket packet;
    stream >> packet;

    if (stream) {
        // ... do something with the packet
        return true;
    } else {
        return false;
    }
}

Because binary_stream is a template, it's easiest to allow the compiler to perform type deduction magic.

If you want the function bodies to be in a source file, it's recommended that you provide your own using alias for your binary_stream type. The alternative is to use the polymorphic equivalents, pmc::buffer_adaptor and pmc::binary_stream, which allow you to change the underlying buffer type at runtime but at the cost of virtual call overhead and lacking some functionality that doesn't mesh well with polymorphism.

How you structure your code is up to you, this is just one way of doing it.

Uh, one more thing...

When using binary_stream, strings are always treated as null-terminated. Writing a char*, std::string_view or std::string will always write a terminating byte to the stream. If you require otherwise, use one of the put functions.

Likewise, reading to std::string assumes the buffer contains a null-terminator. If it does not, an empty string will be returned. If you know the length of the string or need to support a custom terminating/sentinel value, use get() and find_first_of().

What else is in the box?

Here's a very quick rundown on some of the included extras.

  • hexi::file_buffer
    • For dealing with binary files. Simples.
  • hexi::static_buffer
    • Fixed-size networking buffer for when you know the upper bound on the amount of data you'll need to send or receive in one go. Essentially a wrapper around std::array but with added state tracking. Handy if you need to deserialise in multiple steps (read packet header, dispatch, read packet body).
  • hexi::dynamic_buffer
    • Resizeable buffer for when you want to deal with occasional large read/writes without having to allocate the space up front. Internally, it adds additional allocations to accomodate extra data rather than requesting a larger allocation and copying data as std::vector would. It reuses allocated blocks where possible and has support for Asio (Boost or standalone). Effectively, it's a linked list buffer.
  • hexi::tls_block_allocator
    • Allows many instances of dynamic_buffer to share a larger pool of pre-allocated memory, with each thread having its own pool. This is useful when you have many network sockets to handle and want to avoid the general purpose allocator. The caveat is that a deallocation must be made by the same thread that made the allocation, thus limiting access to the buffer to a single thread (with some exceptions).
  • hexi::endian
    • Provides functionality for handling endianness of integral types.

Before we wrap up, look at these tidbits...

We're at the end of the overview, but there's more to discover if you decide to give Hexi a shot. Here's a selection of tasty morsels:

  • binary_stream allows you to perform write seeking within the stream, when the underlying buffer supports it. This is nice if, for example, you need to update a message header with information that you might not know until the rest of the message has been written; checksums, sizes, etc.
  • binary_stream provides overloaded put and get member functions, which allow for fine-grained control, such as reading/writing a specific number of bytes.
  • binary_stream allows for writing to std::string_view and std::span with view() and span() as long as the underlying container is contiguous. This allows you to create views into the buffer's data, providing a fast, zero-copy way to read strings and arrays from the stream. If you do this, you should avoid writing to the same buffer while holding views to the data.
  • buffer_adaptor provides a template option, space_optimise. This is enabled by default and allows it to avoid resizing containers in cases where all data has been read by the stream. Disabling it allows for preserving data even after having been read. This option is only relevant in scenarios where a single buffer is being both written to and read from.
  • buffer_adaptor provides find_first_of, making it easy to find a specific sentinel value within your buffer.

To learn more, check out the examples in docs/examples!

Thanks for listening! Now go unload the dis[C Make Lists](include/CMakeLists.txt)hwasher.

联系我们 contact @ memedata.com