虎甲虫作为文件存储
TigerBeetle as a File Storage

原始链接: https://aivarsk.com/2025/12/07/tigerbeetle-blob-storage/

## TigerBeetle 用于持久化文件存储 本项目演示了如何在 TigerBeetle 数据库中存储任意二进制文件,利用其双向记账系统实现数据的持久性和可审计性。作者利用 TigerBeetle 的 `Account` 和 `Transfer` 记录来分别表示文件和文件内容。文件名存储在 Account ID 中,文件大小和长度存储在用户数据字段中。文件数据被分割成块,从系统账户转移到特定文件的账户,确保所有字节在双方都有记录。 检索是通过查询与文件 Account ID 相关的贷方 Transfer 来实现的,受益于 TigerBeetle 保证的时间戳排序。 使用 105MB 视频文件进行测试,显示写入速度约为 642 kB/s,而读取速度出人意料地更快,达到 2,228 kB/s。至关重要的是,数据完整性通过 SHA256 校验和进行验证,确认在存储和检索过程中没有数据丢失——即使在潜在的网络、机器或存储故障的情况下也是如此。完整的源代码可供实现。

虎蜂作为文件存储 (aivarsk.com) 16 分,由 aivarsk 发布 4 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 WJW 1 小时前 [–] 虎蜂非常酷,我很希望看到更多。据我记得,他们暗示你可以理论上插入不同于他们一直使用的借方/贷方模型的存储引擎。这方面有什么进展吗?我很想使用它,但只是没有规模足够大的账务工作,让引入虎蜂有意义。:(回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Could not keep it under the rug until April Fool’s Day

TigerBeetle is a reliable, fast, and highly available database for financial accounting. It tracks financial transactions or anything else that can be expressed as double-entry bookkeeping, providing three orders of magnitude more performance and guaranteeing durability even in the face of network, machine, and storage faults.

Challenge accepted

Continuing my if all you have is a hammer, everything looks like a nail journey, I wanted to store arbitrary binary blobs in TigerBeetle to protect them from storage faults. If I can do that, I can store anything.

The id field of my Accounts will contain the filename (16-byte limit). I will store the total file size in the user_data_64 field and the filename length in the user_data_32 field (to simplify decoding). And my Accounts will have this nice property that credits_posted will contain the actual number of bytes written. I can detect failed uploads and resume upload from the right offset (a future TODO).


def create_a_file(filename, size):
    if len(filename) > 16:
        raise ValueError("Invalid filename, more than 16 bytes")
    account = tb.Account(
        id=int.from_bytes(filename.encode()),
        user_data_64=size,
        user_data_32=len(filename),
        ledger=FILE,
        code=FILE,
    )
    errors = client.create_accounts([account])
    if errors:
        raise ValueError(errors[0])
    return account

What makes me unhappy is that I have not found a good use for the user_data_128 field on the Account record. Such a waste of resources!

I will store the actual bytes in Transfer user_data_128, user_data_64, and user_data_32 fields. That gives a total of 28 bytes per Transfer, and the Transfer amount will contain the number of bytes used in the Transfer. Which will be 28 for all Transfers except the last one containing the remaining bytes.


            transfers.append(
                tb.Transfer(
                    id=tb.id(),
                    debit_account_id=system_id,
                    credit_account_id=file_id,
                    amount=len(block),
                    user_data_128=int.from_bytes(block[:16]),
                    user_data_64=int.from_bytes(block[16:24]),
                    user_data_32=int.from_bytes(block[24:]),
                    ledger=FILE,
                    code=FILE,
                )
            )

Because TigerBeetle uses double-entry bookkeeping, I will transfer all bytes from a system file “.” (debit side) to the desired file (credit side). Which is extremely useful for audit purposes to verify that debits_posted on the system file Account is the same as credits_posted on all file Account records.

As for getting data out of TigerBeetle, I can retrieve all credit Transfers for the specific Account. They are always correctly ordered by the timestamp field as guaranteed by the TigerBeetle.


        timestamp_min = 0

        while True:
            transfers = client.get_account_transfers(
                tb.AccountFilter(
                    account_id=file_id, flags=tb.AccountFilterFlags.CREDITS, limit=BULK, timestamp_min=timestamp_min
                )
            )
            for transfer in transfers:
                timestamp_min = transfer.timestamp
                ...

            if len(transfers) < BULK:
                break
            timestamp_min += 1

All that put together, I performed tests on some of the most valuable files I did not want to ever loose:


(venv)du -b ~/Downloads/homework.mp4
104718755       /home/aivarsk/Downloads/homework.mp4
(venv)time ./tbcp  ~/Downloads/homework.mp4 tb:backup.mp4

real    2m3.697s
user    1m4.408s
sys     0m1.568s

So, you can store your files durably at speeds close to 642 kB/s. Now, let’s retrieve the file and store it on the disk:


(venv)time ./tbcp  tb:backup.mp4 copy.mp4

real    0m47.588s
user    0m27.027s
sys     0m0.553s

Downloading is around four times faster at 2,228 kB/s! And of course, I verified that not a single bit was lost during the round-trip:


(venv)sha256sum ~/Downloads/homework.mp4
4ee75486c7c65a5c158f7f6b2ca6458195aa25b155b0688173b4b52583ce4cac  /home/aivarsk/Downloads/homework.mp4
(venv)sha256sum copy.mp4
4ee75486c7c65a5c158f7f6b2ca6458195aa25b155b0688173b4b52583ce4cac  copy.mp4
(venv)

If you want to store your valuable files, guaranteeing durability even in the face of network, machine, and storage faults, here is the full source code

联系我们 contact @ memedata.com