绕过PostgreSQL目录开销,直接计算分区哈希值。
Bypass PostgreSQL catalog overhead with direct partition hash calculations

原始链接: https://www.shayon.dev/post/2025/221/bypass-postgresql-catalog-overhead-with-direct-partition-hash-calculations/

## PostgreSQL 哈希分区与优化 PostgreSQL 的哈希分区使用哈希函数将表行分布到不同的分区中,具有数据分布均匀、索引查找速度更快以及维护开销降低等优点。然而,通过父表查询需要进行代价高昂的目录查找来将查询路由到正确的分区,尤其是在多级分区中。 虽然哈希分区可以提高整体性能,但识别目标分区的开销可能会在高吞吐量应用中成为瓶颈。该过程涉及解析键、计算哈希值以及导航分区层次结构——所有这些都发生在查询实际到达数据之前。 一项重要的优化是完全绕过这些目录查找。如果您的应用程序*已经知道*分区键值,可以使用像 `pg_hash_func` Ruby gem 这样的工具,或直接在 SQL 中调用 PostgreSQL 的哈希函数,直接计算目标分区。通过消除数据库开销,这可以提高高达 20 倍的速度。 最终,PostgreSQL 提供了灵活性:使用父表以简化操作,直接调用哈希函数以获得适度收益,或在应用程序中计算分区,以实现对延迟敏感型工作负载的最大性能。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 绕过PostgreSQL目录开销,使用直接分区哈希计算 (shayon.dev) 3点 由 shayonj 1小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

PostgreSQL’s hash partitioning distributes rows across partitions using deterministic hash functions. When you query through the parent table, PostgreSQL must perform catalog lookups to route each query to the correct partition. This results in measurable overhead for high-throughput applications, especially if you decide to use multi-level partitioning schemes where PostgreSQL must traverse deeper catalog structures to identify the target partition. Let’s take a look at some findings on speeding up the part where you already know the partition key values.

What is hash-based partitioning?

Hash partitioning distributes table rows across multiple partitions using a hash function applied to a partition key. Unlike range or list partitioning, it ensures relatively even distribution without requiring knowledge of data patterns.

You can use hash partitioning to break down large tables and distribute load. This makes index lookups faster since each partition maintains smaller indexes. It also reduces autovacuum pressure by allowing PostgreSQL to vacuum smaller partitions independently rather than processing one massive table. Additionally, query performance improves when PostgreSQL can eliminate entire partitions during query planning (partition pruning), and maintenance operations like REINDEX or ANALYZE run faster on smaller partition chunks.

Let’s take an example to see how this works:

-- Creating a hash-partitioned table
CREATE TABLE events (
    id bigint,
    user_id bigint,
    event_type integer,
    payload text,
    created_at timestamp
) PARTITION BY HASH (user_id);

-- Creating the actual partitions
CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (modulus 4, remainder 0);
CREATE TABLE events_1 PARTITION OF events FOR VALUES WITH (modulus 4, remainder 1);
CREATE TABLE events_2 PARTITION OF events FOR VALUES WITH (modulus 4, remainder 2);
CREATE TABLE events_3 PARTITION OF events FOR VALUES WITH (modulus 4, remainder 3);

Each user_id gets deterministically mapped to one of four partitions. User 101 always lands in the same partition, but different user IDs likely end up in different partitions.

Hash partitioning works well for high-volume tables with unpredictable access patterns, scenarios requiring parallel processing across partitions, and cases where you need load balancing without hotspots. For complex systems, you might implement two-level partitioning—first by user for data isolation, then by event type to further distribute the load.

Two-level partitioning

Now say you need more granular data distribution for optimized queries, you can create two-level hash partitioning. Consider a high-volume events system where you want to partition first by user_id to distribute user data, then by event_type within each user to further optimize query performance:

-- Parent table
CREATE TABLE events (
    id bigint,
    user_id bigint,
    event_type integer,
    payload text,
    created_at timestamp
) PARTITION BY HASH (user_id);

-- First level: 16 partitions by user_id
CREATE TABLE events_0 PARTITION OF events
  FOR VALUES WITH (modulus 16, remainder 0)
  PARTITION BY HASH (event_type);

CREATE TABLE events_1 PARTITION OF events
  FOR VALUES WITH (modulus 16, remainder 1)
  PARTITION BY HASH (event_type);
-- ... continue for all 16 first-level partitions

-- Second level: 8 partitions per user by event_type
CREATE TABLE events_0_0 PARTITION OF events_0
  FOR VALUES WITH (modulus 8, remainder 0);
CREATE TABLE events_0_1 PARTITION OF events_0
  FOR VALUES WITH (modulus 8, remainder 1);
-- ... continue for all combinations, resulting in 128 total leaf partitions (16 × 8)

This creates a partition hierarchy where PostgreSQL must traverse multiple levels to find the target partition.

The catalog lookup overhead

When you execute a query like:

SELECT * FROM events
WHERE user_id = 101 AND event_type = 101;

PostgreSQL’s query planner must perform several expensive operations:

Step 1: Parse partition keys

The planner analyzes your WHERE clause to identify values for each partition key (user_id and event_type).

Step 2: Calculate hash values

PostgreSQL applies its internal hash functions. Something like:

  • hashint8extended(101, seed) for the bigint user_id
  • hashint4extended(101, seed) for the integer event_type

Step 3: Navigate the partition hierarchy

Using the calculated hash values, PostgreSQL must:

  • Query pg_class to find partition OIDs for the parent table
  • Traverse pg_inherits to map parent-child relationships
  • Calculate which first-level partition to target: (hash_user_id + magic_constant) % 16 for bigint
  • Find child partitions of that first-level partition
  • Calculate which second-level partition to target: hash_event_type % 8 for integer (no magic constant)
  • Resolve the final leaf partition name

Step 4: Build the execution plan

PostgreSQL performs additional metadata lookups to:

  • Verify partition constraints match the calculated hash values
  • Apply partition pruning optimizations
  • Generate the final query plan targeting the specific partition

The CPU overhead from these catalog traversals becomes significant under high query load. Each query waits for the planner to complete this multi-step process, even though your application already knows exactly which partition contains the data. The deeper your partition hierarchy, the more expensive these lookups become.

What if you already know the partition details?

Here’s where it gets interesting. In many applications, you already have context about the data you’re querying. You know the specific user_id and event_type values. You might even know from your application logic that you’re looking for user 101’s events of type 101.

What if instead of paying the catalog lookup tax every time, your application could calculate the target partition directly? Instead of:

SELECT * FROM events
WHERE user_id = 101 AND event_type = 101;

You could query the exact partition:

SELECT * FROM events_0_5
WHERE user_id = 101 AND event_type = 101;

This completely bypasses PostgreSQL’s catalog traversal. No hash function calls, no pg_class lookups, no hierarchy navigation—just a direct query to the partition that contains your data, while still benefiting from having partitioned data for improved performance, smaller indexes, reduced autovacuum pressure, and better maintenance operations.

This is where pg_hash_func comes in

pg_hash_func is a Ruby gem that reverse engineers PostgreSQL’s internal hash partitioning logic. It replicates PostgreSQL’s lookup3 hash function from src/common/hashfn.c and the partition-specific logic from src/backend/access/hash/hashfunc.c.

The gem currently supports PostgreSQL’s integer-based hash partitioning:

  • bigint (int8) - uses calculate_partition_index_bigint with magic constant
  • integer (int4) - uses calculate_partition_index_int4 without magic constant
  • smallint (int2) - also uses calculate_partition_index_int4 (PostgreSQL treats int2 and int4 the same)

The gem doesn’t yet support text/string keys, UUIDs, or other data types, but focuses on integer types.

The gem handles all the complex bit manipulation, signed/unsigned arithmetic conversions, and magic constants that PostgreSQL uses internally. Here’s how you can use it:

require 'pg_hash_func'

# Calculate first-level partition for user_id (bigint)
user_partition = PgHashFunc.calculate_partition_index_bigint(
  value: 101,
  num_partitions: 16
)
# => 0

# Calculate second-level partition for event_type (integer)
event_partition = PgHashFunc.calculate_partition_index_int4(
  value: 101,
  num_partitions: 8
)
# => 5

# Construct the target partition name
target_table = "events_#{user_partition}_#{event_partition}"
# => "events_0_5"

# Query directly - no catalog lookup overhead!
result = db.exec("SELECT * FROM #{target_table} WHERE user_id = 101 AND event_type = 101")

Building this gem taught me a lot about bit manipulation, hash algorithms, and PostgreSQL’s uint64 arithmetic quirks. h/t to Claude 4 (🙃) to help me unlock some bit manipulation tricks in Ruby. The gem has comprehensive spec coverage that tests against actual PostgreSQL behavior in CI. Overall, a journey in itself, but that’s a blog post for another time.

Calling PostgreSQL functions directly

If you’re happy with a SQL-based solution or don’t use ruby but want to avoid the full catalog traversal, you can call PostgreSQL’s hash functions directly:

-- Calculate partitions using PostgreSQL functions directly
-- For bigint: ((hash + magic_constant) % 2^64) % num_partitions, then ensure non-negative
-- For int4: (hash % 2^64) % num_partitions, then ensure non-negative
SELECT
  -- First level: user_id partition (bigint with magic constant)
  (((((hashint8extended(101::bigint, 8889443574745275645::bigint)::numeric + 5270998738748236643::numeric) % 18446744073709551616::numeric) % 16::numeric) + 16::numeric) % 16::numeric) as user_partition,
  -- Second level: event_type partition (int4, no magic constant)
  ((((hashint4extended(101::integer, 8889443574745275645::bigint)::numeric % 18446744073709551616::numeric) % 8::numeric) + 8::numeric) % 8::numeric) as event_partition;
-- Results: user_partition = 0, event_partition = 5

This gives you the partition indices without the catalog lookup, but you still pay the network round-trip cost.

Speed benefits (up to 20x faster)

The benchmarks show the performance difference:

Ruby Calculation (int4):   121,964.6 i/s
SQL Query (int4):           5,948.9 i/s - 20.50x slower

Ruby calculations are 20-40x faster than SQL equivalents. The benefit here of course comes from eliminating network round-trips entirely while getting the exact same partition indices that PostgreSQL would calculate. Plus, the database doesn’t need to spend extra CPU and load querying catalog tables.

Concluding

The beauty of PostgreSQL is that you have options. Depending on your tolerance and performance trade-offs you’re willing to make, you can:

  • Use the parent table for simplicity and let PostgreSQL handle everything automatically
  • Call PostgreSQL’s hash functions directly to skip catalog traversal while staying in SQL
  • Calculate partition indices in your application code to eliminate all database overhead

Each approach has its place IMO. For most applications, the standard parent table approach works perfectly fine. But when you’re dealing with high-throughput, latency-sensitive workloads, having the option to optimize at the application layer while keeping all the benefits of partitioning makes PostgreSQL incredibly flexible.

联系我们 contact @ memedata.com