Postgres 是读多还是写多？

Postgres 是读多还是写多？
Is Postgres read heavy or write heavy?

原始链接: https://www.crunchydata.com/blog/is-postgres-read-heavy-or-write-heavy-and-why-should-you-care

## 理解Postgres读/写工作负载以实现最佳调优有效的Postgres调优取决于了解你的数据库是读密集型还是写密集型。虽然业务逻辑通常会提供线索（社交媒体=读密集型，IoT日志记录=写密集型），但混合使用的应用程序需要更深入的分析。了解这个比例可以指导索引、缓存、存储和架构方面的决策。读取操作相对便宜，因为Postgres尽可能地从内存中利用8KB的块。然而，写入操作更复杂，涉及WAL日志记录、潜在的全页复制和索引更新——所有这些都导致更高的I/O成本。提供的查询利用Postgres的内部统计信息来*估计*读/写负载，通过分析与元组更改相关的磁盘页活动。这有助于对数据库进行分类，默认比例为5:1的读取：写入。像`pg_stat_statements`这样的工具提供了另一个视角，跟踪每个查询的读/写行数。 **优化策略各不相同：** 写密集型系统受益于更快的存储（NVMe SSD），增加的RAM，最小化的索引，WAL调优和优化的检查点。读密集型系统优先考虑缓存、查询/索引优化以及使用只读副本进行扩展。大多数数据库倾向于读密集型，潜在的读写比例约为10:1。使用这些方法进行主动监控是保持健康、快速的Postgres数据库的关键。

## Postgres：读多还是写多？ - Hacker News 讨论总结最近 Hacker News 上出现了一场讨论，围绕 Crunchy Data 的一篇文章，探讨如何确定 PostgreSQL 工作负载是读密集型还是写密集型。这篇文章本身收到了褒贬不一的评价，有些人认为它有助于理解 PostgreSQL 内部机制和工作负载测量，而另一些人则批评它缺乏比较背景——认为没有基准测试很难评估“密集型”。评论者们争论了典型的工作负载 профиль，观点从主要读密集型（尤其是在显示内容的应用程序中）到写密集型（常见于 OLTP 系统和数据捕获）不等。许多人指出，在分析工作负载特征时，模式、应用程序使用情况和缓存等因素很重要。讨论还涉及使用 PostgreSQL 系统表进行分析的复杂性，建议使用 `pg_stat_statements` 作为更实用的方法。人们对文章的质量及其对公司信誉的潜在影响表示担忧，同时也对 HN 的排名算法和内容质量提出了更广泛的投诉。

原文

When someone asks about Postgres tuning, I always say “it depends”. What “it” is can vary widely but one major factor is the read and write traffic of a Postgres database. Today let’s dig into knowing if your Postgres database is read heavy or write heavy.

Of course write heavy or read heavy can largely be inferred from your business logic. Social media app - read heavy. IoT logger - write heavy. But …. Many of us have mixed use applications. Knowing your write and read load can help you make other decisions about tuning and architecture priorities with your Postgres fleet.

Understanding whether a Postgres database is read-heavy or write-heavy is paramount for effective database administration and performance tuning. For example, a read-heavy database might benefit more from extensive indexing, query caching, and read replicas, while a write-heavy database might require optimizations like faster storage, efficient WAL (Write-Ahead Log) management, table design considerations (such as fill factor and autovacuum tuning) and careful consideration of transaction isolation levels.

By reviewing a detailed read/write estimation, you can gain valuable insights into the underlying workload characteristics, enabling informed decisions for optimizing resource allocation and improving overall database performance.

Read and writes are not really equal

The challenge here in looking at Postgres like this is that reads and writes are not really equal.

Postgres reads data in whole 8kb units, called blocks on disk or pages once they’re part of the shared memory. The cost of reading is much lower than writing. Since the most frequently used data generally resides in the shared buffers or the OS cache, many queries never need additional physical IO and can return results just from memory.
Postgres writes by comparison are a little more complicated. When changing an individual tuple, Postgres needs to write data to WAL defining what happens. If this is the first write after a checkpoint, this could include a copy of the full data page. This also can involve writing additional data for any index changes, toast table changes, or toast table indexes. This is the direct write cost of a single database change, which is done before the commit is accepted. There is also the IO cost for writing out all dirty page buffers, but this is generally done in the background by the background writer. In addition to these write IO costs, the data pages need to be in memory in order to make changes, so every write operation also has potential read overhead as well.

That being said - I’ve worked on a query using internal table statistics that loosely estimates read load and write load.

This query leverages Postgres’ internal metadata to provide an estimation of the number of disk pages (or blocks) that have been directly affected by changes to a given number of tuples (rows). This estimation is crucial for understanding the read/write profile of a database, which in turn can inform optimization strategies (see below).

The query's logic is broken down into several Common Table Expressions (CTEs) to enhance readability and modularity:

ratio_target CTE:

This initial CTE is designed to establish a predefined threshold. It allows the user to specify a target ratio of read pages per write page. This ratio serves as the primary criteria for classifying a database or table as either read-heavy or write-heavy.

I’ve set the ratio in the query to 5 reads : 1 write, which means that roughly 20% of the database activity would be writes in this case. This is a bit of a fudge factor number and the exact definition of what makes up a write-heavy database may differ. If you set to 100, it would consider 100 reads to be equivalent to 1 write, or 1%; this is to allow you to tweak the definitions here for the classifications.

By defining this threshold explicitly, the query provides a flexible mechanism for evaluating different performance characteristics based on specific application requirements. For instance, a higher ratio_target might indicate a preference for read-intensive operations, while a lower one might suggest a workload dominated by writes.

table_list CTE

This CTE is responsible for the core calculations necessary to determine the read and write page counts. It performs the following key functions:

Total read pages:

It calculates the total number of pages that are typically read for the tables under consideration. This metric is fundamental to assessing the read demand placed on the database.

Estimated changed pages for writes:

To estimate the number of pages affected by write operations, the table_list CTE utilizes the existing relpages (total pages) and reltuples (total tuples) statistics from the pg_class system catalog. By calculating the ratio of relpages to reltuples, the query derives an estimated density of tuples per page. This density is then applied to the observed number of tuple writes to project how many physical pages were likely impacted by these write operations. This approach provides a practical way to infer disk I/O related to writes without needing to track every individual page modification.

Final comparison and classification

After the table_list CTE has computed the estimated read pages and write-affected pages, the final stage of the query involves a comparative analysis. The calculated number of read pages is directly compared against the estimated number of write pages. Based on this comparison, and in conjunction with the ratio_target defined earlier, the query then classifies each table (or the database as a whole) into one of several categories. These categories typically include:

Read-heavy: This classification is applied when the proportion of read pages significantly outweighs the write pages, based on the defined ratio_target.
Write-heavy: Conversely, this classification indicates that write operations are more prevalent, with a higher number of write-affected pages relative to read pages.
Other scenarios: The query can also identify other scenarios, such as balanced workloads where read and write operations are roughly equivalent, or cases where the data volume is too low to make a definitive classification.

The read/write Postgres query:

WITH
ratio_target AS (SELECT 5 AS ratio),
table_list AS (SELECT
 s.schemaname,
 s.relname AS table_name,
 -- Sum of heap and index blocks read from disk (from pg_statio_user_tables)
 si.heap_blks_read + si.idx_blks_read AS blocks_read,
 -- Sum of all write operations (tuples) (from pg_stat_user_tables)
s.n_tup_ins + s.n_tup_upd + s.n_tup_del AS write_tuples,
relpages * (s.n_tup_ins + s.n_tup_upd + s.n_tup_del ) / (case when reltuples = 0 then 1 else reltuples end) as blocks_write
FROM
 -- Join the user tables statistics view with the I/O statistics view
 pg_stat_user_tables AS s
JOIN pg_statio_user_tables AS si ON s.relid = si.relid
JOIN pg_class c ON c.oid = s.relid
WHERE
 -- Filter to only show tables that have had some form of read or write activity
(s.n_tup_ins + s.n_tup_upd + s.n_tup_del) > 0
AND
 (si.heap_blks_read + si.idx_blks_read) > 0
 )
SELECT *,
 CASE
   -- Handle case with no activity
   WHEN blocks_read = 0 and blocks_write = 0 THEN
     'No Activity'
   -- Handle write-heavy tables
   WHEN blocks_write * ratio > blocks_read THEN
     CASE
       WHEN blocks_read = 0 THEN 'Write-Only'
       ELSE
         ROUND(blocks_write :: numeric / blocks_read :: numeric, 1)::text || ':1 (Write-Heavy)'
     END
   -- Handle read-heavy tables
   WHEN blocks_read > blocks_write * ratio THEN
     CASE
       WHEN blocks_write = 0 THEN 'Read-Only'
       ELSE
         '1:' || ROUND(blocks_read::numeric / blocks_write :: numeric, 1)::text || ' (Read-Heavy)'
     END
   -- Handle balanced tables
   ELSE
     '1:1 (Balanced)'
 END AS activity_ratio
FROM table_list, ratio_target
ORDER BY
 -- Order by the most active tables first (sum of all operations)
 (blocks_read + blocks_write) DESC;

Results will look something like this:

schemaname |  table_name   | blocks_read | write_tuples | blocks_write | ratio |     activity_ratio

- -----------+---------------+-------------+--------------+--------------+-------+------------------------

public     | audit_logs    |           2 |      1500000 |        18519 |     5 | 9259.5:1 (Write-Heavy)
public     | orders        |           8 |            4 |           -0 |     5 | Read-Only
public     | articles      |           2 |           10 |            1 |     5 | 0.5:1 (Write-Heavy)
public     | user_profiles |           1 |            3 |           -0 |     5 | Read-Only

pg_stat_statements

Another way to look at read and write traffic is through the pg_stat_statements extension. It aggregates statistics for every unique query run on your database. It also will collect data about Postgres queries row by row.

While the above query accounts for a bit more distribution in workload, pg_stat_statements is also a good checkpoint for traffic volume.

SELECT
  SUM(CASE WHEN query ILIKE 'SELECT%' THEN rows ELSE 0 END) AS rows_read,
   SUM(CASE WHEN query ILIKE 'INSERT%' OR query ILIKE 'UPDATE%' OR query ILIKE 'DELETE%' THEN rows ELSE 0 END) AS rows_written
FROM pg_stat_statements;

 cache_hits | disk_reads | rows_read | rows_written
------------+------------+-----------+--------------
      27586 |        998 |    443628 |           30
(1 row)

For write-heavy systems, the bottleneck is often I/O and transaction throughput. You're constantly writing to the disk, which is slower than reading from memory.

Faster Storage: The most direct way to improve write performance is to use faster storage, such as NVMe SSDs, and provision more I/O operations per second (IOPS).
Postgres 18 now has asynchronous I/O which should be more performant than traditional methods.
More RAM: While reads benefit from RAM for caching too, writes also benefit from a larger shared_buffers pool, which can hold more dirty pages before they need to be flushed to disk.
I/O burst systems: Many cloud based systems come with extra I/O out of the box, so looking at these numbers may also be helpful.
Minimize Indexes: While essential for reads, every index needs to be updated during a write operation. Over-indexing can significantly slow down writes so remove unused indexes.
Utilizing HOT updates: Postgres has a performance improvement for frequently updated rows that are indexed, so adjusting fill factor to take advantage of this could be worth looking into.
Tune the WAL (Write-Ahead Log): The WAL is where every change is written before it's committed to the main database files. Tuning parameters like wal_buffers can reduce the number of disk flushes and improve write performance.
Optimize Checkpoints: Checkpoints sync the data from shared memory to disk. Frequent or large checkpoints can cause I/O spikes. Adjusting checkpoint_timeout and checkpoint_completion_target can smooth out these events.

For read-heavy systems, the primary goal is to get data to the user as quickly as possible and ideally have much data in the buffer cache so it is not reading from disk.

Effective Caching: Ensure your shared_buffers and effective_cache_size are configured to take advantage of available RAM. This lets Postgres keep frequently accessed data in memory, avoiding costly disk reads.
Optimize Queries and Indexes: Use EXPLAIN ANALYZE to pinpoint slow SELECT queries and add indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY statements. Remember, indexes speed up lookups at the cost of slower writes.
Scaling out with read replicas: A read replica is a copy of your primary database that's kept in sync asynchronously. All write operations go to the primary, but you can distribute read queries across one or more replicas. This distributes the read load, offloads traffic from your primary server, and can dramatically improve read throughput without impacting your write performance.

Most Postgres databases are going to be far more read heavy than write heavy. I estimate just based on experience that 10:1 reads to writes is probably something where it is starting to get write heavy. Of course, there are outliers to this.

The right scaling strategy depends entirely on your workload. By proactively monitoring your Postgres stats using internal statistics in the Postgres catalog, you can make informed decisions that will keep your database healthy and your application fast.

Co-authored with Elizabeth Christensen