Show HN: ToplingDB - A Persistent Key-Value Store for External Storage

原始链接: https://github.com/topling/toplingdb

ToplingDB, developed by Topling Inc., is a persistent key-value store built upon RocksDB, offering enhanced performance and features. Key enhancements include SidePlugin for flexible DB configuration via JSON/YAML, an embedded HTTP server for web-based monitoring and online configuration changes, and optimized transaction lock management. ToplingDB also delivers faster MultiGet operations with concurrent IO and zero-copy features for point searches and iterators. It includes a custom memtable for log indexing, supports Prometheus metrics, and provides bug fixes not yet merged into upstream RocksDB. Built-in SidePlugins allow separation of components from the core. MyTopling (MySQL) and Todis (Redis) are cloud native DB services that utilize ToplingDB. ToplingDB offers public and private repositories containing components like improved MemTable (cspp-memtable), distributed compaction (topling-dcompact), and ToplingZipTable optimized for RAM/SSD.

ToplingDB is a high-performance, persistent key-value store forked from RocksDB, created by the developer of TerarkDB. The project boasts up to 8x performance improvements over RocksDB through several key architectural changes. Key improvements include a Crash Safe Parallel Patricia trie (CSPP) replacing SkipList for MemTable, leading to 8x faster performance. The BlockBasedTable is swapped for ToplingZipTable, a searchable compressed table that eliminates BlockCache needs by compressing keys/indexes using NestLoudsTrie and values with a high-ratio zip, unzipping at 1GB/sec. Other notable features include omitting MemTable flush to L0, using MemTable as an index to WAL, Prefix caching for candidate SSTs, distributed compaction, and integrated MergeOperator/CompactionFilter. ToplingDB also supports JSON/YAML configuration, an optional embedded WebView for DB structure visualization, and online config updates via HTTP. MyTopling, a fork of MyRocks (MySQL), further integrates these improvements, outperforming InnoDB in many aspects. The creators have contributed ~100 PRs to RocksDB. Welcome to the community!
相关文章

原文

ToplingDB: A Persistent Key-Value Store for External Storage

ToplingDB is developed and maintained by Topling Inc. It is built with RocksDB. See ToplingDB Branch Name Convention.

ToplingDB's submodule rockside is the entry point of ToplingDB, see SidePlugin wiki.

ToplingDB has much more key features than RocksDB:

  1. SidePlugin enables users to write a json(or yaml) to define DB configs
  2. Embedded Http Server enables users to view almost all DB info on web, this is a component of SidePlugin
  3. Embedded Http Server enables users to online change db/cf options and all db meta objects(such as MemTabFactory, TableFactory, WriteBufferManager ...) without restart the running process
  4. Many improvements and refactories on RocksDB, aimed for performance and extendibility
  5. Topling transaction lock management, 5x faster than rocksdb
  6. MultiGet with concurrent IO by fiber/coroutine + io_uring, much faster than RocksDB's async MultiGet
  7. Topling de-virtualization, de-virtualize hotspot (virtual) functions, and key prefix caches, bechmarks
  8. Topling zero copy for point search(Get/MultiGet) and Iterator
  9. Topling memtable as log index, omit memtable flush to L0
  10. Builtin SidePlugins for existing RocksDB components(Cache, Comparator, TableFactory, MemTableFactory...)
  11. Builtin Prometheus metrics support, this is based on Embedded Http Server
  12. Many bugfixes for RocksDB, a small part of such fixes was Pull Requested to upstream RocksDB

ToplingDB cloud native DB services

  1. MyTopling(MySQL on ToplingDB), MyTopling on aliyun
  2. Todis(Redis on ToplingDB)

With SidePlugin mechanics, plugins/components can be physically separated from core toplingdb

  1. Can be compiled to a separated dynamic lib and loaded at runtime
  2. User code need not any changes, just change json/yaml files
  3. Topling's non-open-source enterprise plugins/components are delivered in this way
toplingdb
 \__ sideplugin
      \__ rockside                 (submodule , sideplugin core and framework)
      \__ topling-zip              (auto clone, zip and core lib)
      \__ cspp-memtab              (auto clone, sideplugin component)
      \__ cspp-wbwi                (auto clone, sideplugin component)
      \__ topling-sst              (auto clone, sideplugin component)
      \__ topling-rocks            (auto clone, sideplugin component)
      \__ topling-zip_table_reader (auto clone, sideplugin component)
      \__ topling-dcompact         (auto clone, sideplugin component)
           \_ tools/dcompact       (dcompact-worker binary app)
Repository Permission Description (and components)
ToplingDB public Top repository, forked from RocksDB with our fixes, refactories and enhancements
rockside public This is a submodule, contains:
  • SidePlugin framework and Builtin SidePlugins
  • Embedded Http Server and Prometheus metrics
cspp-wbwi
(WriteBatchWithIndex)
public With CSPP and carefully coding, CSPP_WBWI is 20x faster than rocksdb SkipList based WBWI
cspp-memtable public (CSPP is Crash Safe Parallel Patricia trie) MemTab, which outperforms SkipList on all aspects: 3x lower memory usage, 7x single thread performance, perfect multi-thread scaling)
topling-sst public 1. SingleFastTable(designed for L0 and L1)
2. VecAutoSortTable(designed for MyTopling bulk_load).
3. Deprecated ToplingFastTable, CSPPAutoSortTable
topling-dcompact public Distributed Compaction with general dcompact_worker application, offload compactions to elastic computing clusters, much more powerful than RocksDB's Remote Compaction
topling-rocks private For build ToplingZipTable, an SST implementation optimized for RAM and SSD space, aimed for L2+ level compaction, which uses topling dedicated searchable in-memory data compression algorithms
topling-zip_table_reader public For read ToplingZipTable by community users, builder of ToplingZipTable is in topling-rocks

To simplify the compiling, repos are auto cloned in ToplingDB's Makefile, community users will auto clone public repo successfully but fail to auto clone private repo, thus ToplingDB is built without private components, this is so called community version.

ToplingDB requires C++17, gcc 8.3 or newer is recommended, clang also works.

Even without ToplingZipTable, ToplingDB is much faster than upstream RocksDB:

sudo yum -y install git libaio-devel gcc-c++ gflags-devel zlib-devel bzip2-devel libcurl-devel liburing-devel snappy-devel jemalloc-devel
#sudo apt-get update -y && sudo apt-get install -y libjemalloc-dev libaio-dev libgflags-dev zlib1g-dev libbz2-dev libcurl4-gnutls-dev liburing-dev libsnappy-dev libbz2-dev liblz4-dev libzstd-dev
git clone https://github.com/topling/toplingdb
cd toplingdb
make -j`nproc` db_bench DEBUG_LEVEL=0
cp sideplugin/rockside/src/topling/web/{style.css,index.html} ${/path/to/dbdir}
cp sideplugin/rockside/sample-conf/db_bench_*.yaml .
export LD_LIBRARY_PATH=`find sideplugin -name lib_shared`
# change db_bench_community.yaml as your needs
# 1. use default path(/dev/shm) if you have no fast disk(such as a cloud server)
# 2. change max_background_compactions to your cpu core num
# 3. if you have github repo topling-rocks permissions, you can use db_bench_enterprise.yaml
# 4. use db_bench_community.yaml is faster than upstream RocksDB
# 5. use db_bench_enterprise.yaml is much faster than db_bench_community.yaml
# command option -json can accept json and yaml files, here use yaml file for more human readable
./db_bench -json=db_bench_community.yaml -num=10000000 -disable_wal=true -value_size=20 -benchmarks=fillrandom,readrandom -batch_size=10
# you can access http://127.0.0.1:2011 to see webview
# you can see this db_bench is much faster than RocksDB

For performance and simplicity, ToplingDB disabled some RocksDB features by default:

Feature Control MACRO
Dynamic creation of ColumnFamily ROCKSDB_DYNAMIC_CREATE_CF
User level timestamp on key TOPLINGDB_WITH_TIMESTAMP
Wide Columns TOPLINGDB_WITH_WIDE_COLUMNS
fabricated features for read TOPLINGDB_WITH_FABRICATED_COMPLEXITY

Note: Dynamic creation of ColumnFamily is not supported by SidePlugin

To enable these features, add -D${MACRO_NAME} to var EXTRA_CXXFLAGS, such as build ToplingDB for java with dynamic ColumnFamily:

make -j`nproc` EXTRA_CXXFLAGS='-DROCKSDB_DYNAMIC_CREATE_CF' rocksdbjava

To conform open source license, the following term of disallowing bytedance is deleted since 2023-04-24, that is say: bytedance using ToplingDB is no longer illeagal and is not a shame.

We disallow bytedance using this software, other terms are identidal with upstream rocksdb license, see LICENSE.Apache, COPYING and LICENSE.leveldb.

The terms of disallowing bytedance are also deleted in LICENSE.Apache, COPYING and LICENSE.leveldb.




RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat ([email protected]) and Jeff Dean ([email protected])

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.

联系我们 contact @ memedata.com