原生 ZFS VDEV 用于对象存储 (OpenZFS Summit)
Native ZFS VDEV for Object Storage (OpenZFS Summit)

原始链接: https://www.zettalane.com/blog/openzfs-summit-2025-mayanas-objbacker.html

## MayaNAS & MayaScale:基于ZFS的云原生存储 在2025年OpenZFS开发者峰会上,Zettalane展示了MayaNAS(文件存储)和MayaScale(块存储),该平台利用ZFS的灵活性,提供经济高效、高性能的云存储。其核心创新是**objbacker.io**,一种用于对象存储(S3、GCS、Azure Blob Storage)的ZFS VDEV原生实现,**绕过了FUSE**,实现了**3.7 GB/s的读取吞吐量**——显著优于传统的基于FUSE的方法。 该平台通过智能分层数据来解决云NAS的高成本问题:元数据和小块数据驻留在快速的本地NVMe上,而大型顺序数据流则来自更便宜的对象存储。这是由ZFS的特殊设备架构实现的。 **objbacker.io**利用原生云SDK进行直接对象访问,最大限度地减少开销。在AWS上的基准测试表明,使用对齐的1MB块和并行存储桶I/O,性能令人印象深刻。MayaScale通过NVMe-oF提供亚毫秒延迟的块存储,并具有主动-主动HA,提供从60K到585K写入IOPS的性能等级。 MayaNAS和MayaScale都设计用于多云部署,使用一致的Terraform模块,简化了在AWS、Azure和GCP上的管理。该平台旨在覆盖90%的企业存储需求,与传统的云块存储相比,可节省高达**70%的成本**。

## ZFS 与对象存储:摘要 一项新进展允许直接在对象存储(如 Amazon S3、Wasabi 或 Google Cloud Storage)之上创建 ZFS 虚拟设备(vdev)。这使得能够使用 ZFS 的数据完整性功能——校验和、压缩和快照——同时兼具对象存储的成本效益。 一个关键的应用场景是简化、更便宜的备份和灾难恢复。数据可以有效地从本地 ZFS 存储池发送到由对象存储支持的远程存储池,无需持续运行云虚拟机。恢复过程包括启动虚拟机并在需要时导入基于对象的存储池。 该架构采用两层系统:快速本地存储(SSD)用于元数据和小块,对象存储用于大量数据。这种方法利用了 ZFS 本身的对象存储层,将对象存储视为块设备。 这建立在先前的工作(如 DelphiX)之上,旨在将 ZFS 与对象存储集成,与 EBS 等解决方案相比,可能在成本和耐用性方面具有优势。它还为使用 ZFS 数据集来实现兼容 S3 的存储或直接支持 RDBMS 页面存储打开了可能性。
相关文章

原文

We presented MayaNAS and MayaScale at OpenZFS Developer Summit 2025 in Portland, Oregon. The centerpiece of our presentation: objbacker.io—a native ZFS VDEV implementation for object storage that bypasses FUSE entirely, achieving 3.7 GB/s read throughput directly from S3, GCS, and Azure Blob Storage.

Presenting at OpenZFS Summit

The OpenZFS Developer Summit brings together the core developers and engineers who build and maintain ZFS across platforms. It was the ideal venue to present our approach to cloud-native storage: using ZFS's architectural flexibility to create a hybrid storage system that combines the performance of local NVMe with the economics of object storage.

Our 50-minute presentation covered the complete Zettalane storage platform—MayaNAS for file storage and MayaScale for block storage—with a deep technical dive into the objbacker.io implementation that makes ZFS on object storage practical for production workloads.

The Cloud NAS Challenge

Cloud storage economics present a fundamental problem for NAS deployments:

$96K/year

100TB on EBS (gp3)

$360K/year

100TB on AWS EFS

The insight that drives MayaNAS: not all data needs the same performance tier. Metadata operations require low latency and high IOPS. Large sequential data needs throughput, not IOPS. ZFS's special device architecture lets us place each workload on the appropriate storage tier.

ZFS Special Device Architecture: Metadata and small blocks (<128KB) on local NVMe SSD. Large blocks (1MB+) streamed from object storage. One filesystem, two performance tiers, optimal cost.

objbacker.io: Native ZFS VDEV for Object Storage

The traditional approach to ZFS on object storage uses FUSE-based filesystems like s3fs or goofys to mount buckets, then creates ZFS pools on top. This works, but FUSE adds overhead: every I/O crosses the kernel-userspace boundary twice.

objbacker.io takes a different approach. We implemented a native ZFS VDEV type (VDEV_OBJBACKER) that communicates directly with a userspace daemon via a character device (/dev/zfs_objbacker). The daemon uses native cloud SDKs (AWS SDK, Google Cloud SDK, Azure SDK) for direct object storage access.

Architecture Comparison

Approach I/O Path Overhead
FUSE-based (s3fs) ZFS → VFS → FUSE → userspace → FUSE → VFS → s3fs → S3 High (multiple context switches)
objbacker.io ZFS → /dev/zfs_objbacker → objbacker.io → S3 SDK Minimal (direct path)

How objbacker.io Works

objbacker.io is a Golang program with two interfaces:

  • Frontend: ZFS VDEV interface via /dev/zfs_objbacker character device
  • Backend: Native cloud SDK integration for GCS, AWS S3, and Azure Blob Storage

ZIO to Object Storage Mapping

ZFS VDEV I/O /dev/objbacker Object Storage
ZIO_TYPE_WRITE WRITE PUT object
ZIO_TYPE_READ READ GET object
ZIO_TYPE_TRIM UTRIM DELETE object
ZIO_TYPE_IOCTL (sync) USYNC Flush pending writes

Data Alignment

With ZFS recordsize set to 1MB, each ZFS block maps directly to a single object. Aligned writes go directly as PUT requests without caching. This alignment is critical for performance—object storage performs best with large, aligned operations.

Object Naming: S3backer-compatible layout. A 5MB file creates 5 objects at offsets 0, 1MB, 2MB, 3MB, 4MB. Object names: bucket/00001, bucket/00002, etc.

Validated Performance Results

We presented benchmark results from AWS c5n.9xlarge instances (36 vCPUs, 96 GB RAM, 50 Gbps network):

3.7 GB/s

Sequential Read from S3

2.5 GB/s

Sequential Write to S3

The key to this throughput: parallel bucket I/O. With 6 S3 buckets configured as a striped pool, ZFS parallelizes reads and writes across multiple object storage endpoints, saturating the available network bandwidth.

FIO Test Configuration

ZFS Recordsize 1MB (aligned with object size)
Block Size 1MB
Parallel Jobs 10 concurrent FIO jobs
File Size 10 GB per job (100 GB total)
I/O Engine sync (POSIX synchronous I/O)

MayaScale: High-Performance Block Storage

We also presented MayaScale, our NVMe-oF block storage solution for workloads requiring sub-millisecond latency. MayaScale uses local NVMe SSDs with Active-Active HA clustering.

MayaScale Performance Tiers (GCP)

Tier Write IOPS (<1ms) Read IOPS (<1ms) Best Latency
Ultra 585K 1.1M 280 us
High 290K 1.02M 268 us
Medium 175K 650K 211 us
Standard 110K 340K 244 us
Basic 60K 120K 218 us

Multi-Cloud Architecture

Both MayaNAS and MayaScale deploy consistently across AWS, Azure, and GCP. Same Terraform modules, same ZFS configuration, same management interface. Only the cloud-specific networking and storage APIs differ.

Component AWS Azure GCP
Instance c5.xlarge D4s_v4 n2-standard-4
Block Storage EBS gp3 Premium SSD pd-ssd
Object Storage S3 Blob Storage GCS
VIP Migration ENI attach LB health probe IP alias
Deployment CloudFormation ARM Template Terraform

Watch the Full Presentation

The complete 50-minute presentation is available on the OpenZFS YouTube channel:

Note: Video will be available once published by OpenZFS. Check the OpenZFS YouTube channel for the recording.

Presentation Highlights

  • 0:00 - Introduction and Zettalane overview
  • 5:00 - Zettalane ZFS port architecture (illumos-gate based)
  • 12:00 - The cloud NAS cost challenge
  • 18:00 - MayaNAS hybrid architecture with ZFS special devices
  • 25:00 - objbacker.io deep dive: native VDEV implementation
  • 35:00 - Performance benchmarks on AWS
  • 42:00 - MayaScale NVMe-oF block storage
  • 48:00 - Q&A and future directions

Getting Started

Deploy MayaNAS or MayaScale on your preferred cloud platform:

Conclusion

Presenting at OpenZFS Developer Summit 2025 gave us the opportunity to share our approach with the community that makes ZFS possible. The key technical contribution: objbacker.io demonstrates that native ZFS VDEV integration with object storage is practical and performant, achieving 3.7 GB/s throughput without FUSE overhead.

MayaNAS with objbacker.io delivers enterprise-grade NAS on object storage with 70%+ cost savings versus traditional cloud block storage. MayaScale provides sub-millisecond block storage with Active-Active HA for latency-sensitive workloads. Together, they cover 90% of enterprise storage needs on any major cloud platform.

Special thanks to the OpenZFS community for the foundation that makes this possible.

Ready to deploy cloud-native storage?
Contact Us Download

联系我们 contact @ memedata.com