JuiceFS 是一个基于 Redis 和 S3 构建的分布式 POSIX 文件系统。

JuiceFS 是一个基于 Redis 和 S3 构建的分布式 POSIX 文件系统。
JuiceFS is a distributed POSIX file system built on top of Redis and S3

原始链接: https://github.com/juicedata/juicefs

## JuiceFS：云原生POSIX文件系统 JuiceFS是一个高性能、开源（Apache 2.0）的POSIX兼容文件系统，专为云环境设计。它允许像访问本地存储一样直接访问存储在对象存储（如Amazon S3、Google Cloud Storage和Azure Blob Storage）中的数据，无需修改代码。数据存储在对象存储中，而元数据（文件名、权限等）由Redis、MySQL或TiKV等引擎管理。JuiceFS具有强一致性、低延迟（毫秒级）和近乎无限的吞吐量。它与Hadoop、Kubernetes（通过CSI Driver）完全兼容，并提供S3兼容网关。主要特性包括数据加密、压缩（LZ4/Zstandard）、文件锁定和对扩展属性的支持。文件在内部被分割成块、切片和区块，以实现高效的存储和检索。JuiceFS已准备就绪，被众多组织使用，并拥有强大的社区积极开发。它提供详细的性能监控和基准测试工具。了解更多信息，请访问[JuiceFS文档中心](https://juicefs.com/)。

## JuiceFS：基于 Redis & S3 的分布式文件系统 - 摘要 JuiceFS 是一个分布式、POSIX 兼容的文件系统，利用 Redis（或 MySQL）存储元数据，并使用 S3 存储数据。它因其潜在的快速、可扩展和经济高效的存储能力，特别是对于大型数据集，而备受关注。讨论强调了依赖 Redis 存储元数据持久性的问题，一些人提倡使用 Valkey、RocksDB 或 TiKV 等替代方案。性能因元数据存储的选择和工作负载而异；在处理小文件操作时可能存在困难。许多评论员指出 Storj 的对象挂载（避免了单独的元数据存储）和 ZeroFS（在某些基准测试中具有性能优势）等替代方案。许可也是一个考虑因素，JuiceFS 使用 Apache 2.0 许可，而 ZeroFS 提供双重 AGPL/商业许可。最终，JuiceFS 提供了一个引人注目的选择，但成功实施的关键在于仔细考虑元数据存储的选择、工作负载特性和许可影响。

原文

JuiceFS is a high-performance POSIX file system released under Apache License 2.0, particularly designed for the cloud-native environment. The data, stored via JuiceFS, will be persisted in Object Storage (e.g. Amazon S3), and the corresponding metadata can be persisted in various compatible database engines such as Redis, MySQL, and TiKV based on the scenarios and requirements.

With JuiceFS, massive cloud storage can be directly connected to big data, machine learning, artificial intelligence, and various application platforms in production environments. Without modifying code, the massive cloud storage can be used as efficiently as local storage.

📖 Document: Quick Start Guide

Fully POSIX-compatible: Use as a local file system, seamlessly docking with existing applications without breaking business workflow.
Fully Hadoop-compatible: JuiceFS' Hadoop Java SDK is compatible with Hadoop 2.x and Hadoop 3.x as well as a variety of components in the Hadoop ecosystems.
S3-compatible: JuiceFS' S3 Gateway provides an S3-compatible interface.
Cloud Native: A Kubernetes CSI Driver is provided for easily using JuiceFS in Kubernetes.
Shareable: JuiceFS is a shared file storage that can be read and written by thousands of clients.
Strong Consistency: The confirmed modification will be immediately visible on all the servers mounted with the same file system.
Outstanding Performance: The latency can be as low as a few milliseconds, and the throughput can be expanded nearly unlimitedly (depending on the size of the Object Storage). Test results
Data Encryption: Supports data encryption in transit and at rest (please refer to the guide for more information).
Global File Locks: JuiceFS supports both BSD locks (flock) and POSIX record locks (fcntl).
Data Compression: JuiceFS supports LZ4 or Zstandard to compress all your data.

JuiceFS consists of three parts:

JuiceFS Client: Coordinates Object Storage and metadata storage engine as well as implementation of file system interfaces such as POSIX, Hadoop, Kubernetes, and S3 gateway.
Data Storage: Stores data, with supports of a variety of data storage media, e.g., local disk, public or private cloud Object Storage, and HDFS.
Metadata Engine: Stores the corresponding metadata that contains information of file name, file size, permission group, creation and modification time and directory structure, etc., with supports of different metadata engines, e.g., Redis, MySQL, SQLite and TiKV.

JuiceFS can store the metadata of file system on different metadata engines, like Redis, which is a fast, open-source, in-memory key-value data storage, particularly suitable for storing metadata; meanwhile, all the data will be stored in Object Storage through JuiceFS client. Learn more

Each file stored in JuiceFS is split into "Chunk" s at a fixed size with the default upper limit of 64 MiB. Each Chunk is composed of one or more "Slice"(s), and the length of the slice varies depending on how the file is written. Each slice is composed of size-fixed "Block" s, which are 4 MiB by default. These blocks will be stored in Object Storage in the end; at the same time, the metadata information of the file and its Chunks, Slices, and Blocks will be stored in metadata engines via JuiceFS. Learn more

When using JuiceFS, files will eventually be split into Chunks, Slices and Blocks and stored in Object Storage. Therefore, the source files stored in JuiceFS cannot be found in the file browser of the Object Storage platform; instead, there are only a chunks directory and a bunch of digitally numbered directories and files in the bucket. Don't panic! This is just the secret of the high-performance operation of JuiceFS!

Before you begin, make sure you have:

One supported metadata engine, see How to Set Up Metadata Engine
One supported Object Storage for storing data blocks, see Supported Object Storage
JuiceFS Client downloaded and installed

Please refer to Quick Start Guide to start using JuiceFS right away!

Check out all the command line options in command reference.

JuiceFS can be used as a persistent volume for Docker and Podman, please check here for details.

It is also very easy to use JuiceFS on Kubernetes. Please find more information here.

If you wanna use JuiceFS in Hadoop, check Hadoop Java SDK.

Please refer to JuiceFS Document Center for more information.

JuiceFS has passed all of the compatibility tests (8813 in total) in the latest pjdfstest .

All tests successful.

Test Summary Report
-------------------
/root/soft/pjdfstest/tests/chown/00.t          (Wstat: 0 Tests: 1323 Failed: 0)
  TODO passed:   693, 697, 708-709, 714-715, 729, 733
Files=235, Tests=8813, 233 wallclock secs ( 2.77 usr  0.38 sys +  2.57 cusr  3.93 csys =  9.65 CPU)
Result: PASS

Aside from the POSIX features covered by pjdfstest, JuiceFS also provides:

Close-to-open consistency. Once a file is written and closed, it is guaranteed to view the written data in the following opens and reads from any client. Within the same mount point, all the written data can be read immediately.
Rename and all other metadata operations are atomic, which are guaranteed by supported metadata engine transaction.
Opened files remain accessible after unlink from same mount point.
Mmap (tested with FSx).
Fallocate with punch hole support.
Extended attributes (xattr).
BSD locks (flock).
POSIX record locks (fcntl).

JuiceFS provides a subcommand that can run a few basic benchmarks to help you understand how it works in your environment:

A sequential read/write benchmark has also been performed on JuiceFS, EFS and S3FS by fio.

Above result figure shows that JuiceFS can provide 10X more throughput than the other two (see more details).

A simple mdtest benchmark has been performed on JuiceFS, EFS and S3FS by mdtest.

The result shows that JuiceFS can provide significantly more metadata IOPS than the other two (see more details).

See Real-Time Performance Monitoring if you encountered performance issues.

Amazon S3 (and other S3 compatible Object Storage services)
Google Cloud Storage
Azure Blob Storage
Alibaba Cloud Object Storage Service (OSS)
Tencent Cloud Object Storage (COS)
Qiniu Cloud Object Storage (Kodo)
QingStor Object Storage
Ceph RGW
MinIO
Local disk
Redis
...

JuiceFS supports numerous Object Storage services. Learn more.

JuiceFS is production ready and used by thousands of machines in production. A list of users has been assembled and documented here. In addition JuiceFS has several collaborative projects that integrate with other open source projects, which we have documented here. If you are also using JuiceFS, please feel free to let us know, and you are welcome to share your specific experience with everyone.

The storage format is stable, and will be supported by all future releases.

Gateway Optimization
Resumable Sync
Read-ahead Optimization
Optimization for Large-scale Scenarios
Snapshots

We use GitHub Issues to track community reported issues. You can also contact the community for any questions.

Thank you for your contribution! Please refer to the JuiceFS Contributing Guide for more information.

Welcome to join the Discussions and the Slack channel to connect with JuiceFS team members and other users.

JuiceFS collects anonymous usage data by default to help us better understand how the community is using JuiceFS. Only core metrics (e.g. version number) will be reported, and user data and any other sensitive data will not be included. The related code can be viewed here.

You could also disable reporting easily by command line option --no-usage-report:

juicefs mount --no-usage-report

JuiceFS is open-sourced under Apache License 2.0, see LICENSE.

The design of JuiceFS was inspired by Google File System, HDFS and MooseFS. Thanks for their great work!

Why doesn't JuiceFS support XXX Object Storage?

JuiceFS supports many Object Storage services. Please check out this list first. If the Object Storage you want to use is compatible with S3, you could treat it as S3. Otherwise, try reporting any issue.

Can I use Redis Cluster as metadata engine?

Yes. Since v1.0.0 Beta3 JuiceFS supports the use of Redis Cluster as the metadata engine, but it should be noted that Redis Cluster requires that the keys of all operations in a transaction must be in the same hash slot, so a JuiceFS file system can only use one hash slot.

See "Redis Best Practices" for more information.

What's the difference between JuiceFS and XXX?

See "Comparison with Others" for more information.

For more FAQs, please see the full list.