非常规 PostgreSQL 优化
Unconventional PostgreSQL Optimizations

原始链接: https://hakibenita.com/postgresql-unconventional-optimizations

## 非常规 PostgreSQL 优化技巧 标准的数据库优化通常集中在查询重写、索引和规范化上,但创造性的方法可以带来显著的收益。本文探讨了 PostgreSQL 中的两种此类技巧。 首先,它展示了设置 `constraint_exclusion = 'on'` 如何能在分析师查询出错(例如大小写敏感错误)时显著提高查询性能。PostgreSQL 将会跳过全表扫描,如果约束保证没有匹配的行,从而节省资源。这在具有临时查询的 BI/报告环境中尤其有用。 其次,它强调了对大型文本字段(如 URL)使用哈希索引来强制唯一性的好处。虽然标准的 B 树索引可能会变得过大,但哈希索引仅存储哈希值,从而产生显著更小的索引大小(示例中缩小了 5 倍)。然而,哈希索引存在局限性:它们不能被外键引用,并且在使用 `ON CONFLICT` 子句时存在一些问题,可能需要使用 `MERGE` 语句代替。 这些技巧,以及虚拟生成列(PostgreSQL 18+),展示了理解 PostgreSQL 的内部机制并考虑替代方法如何能够带来更高效、更经济的数据库解决方案。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 非常规 PostgreSQL 优化 (hakibenita.com) 16 分,由 haki 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 OptionOfT 13 分钟前 [–] 非常新鲜的内容。虚拟列和哈希索引的工作方式很有趣,但感觉像是附加上去的,而不是无缝地融入整个生态系统。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

When it comes to database optimization, developers often reach for the same old tools: rewrite the query slightly differently, slap an index on a column, denormalize, analyze, vacuum, cluster, repeat. Conventional techniques are effective, but sometimes being creative can really pay off!

In this article, I present unconventional optimization techniques in PostgreSQL.

Table of Contents


Imagine you have this table of users:

db=# CREATE TABLE users (
    id INT PRIMARY KEY,
    username TEXT NOT NULL,
    plan TEXT NOT NULL,
    CONSTRAINT plan_check CHECK (plan IN ('free', 'pro'))
);
CREATE TABLE

For each user you keep their name and which payment plan they're on. There are only two plans, "free" and "pro", so you add a check constraint.

Generate some data and analyze the table:

db=# INSERT INTO users
SELECT n, uuidv4(), (ARRAY['free', 'pro'])[ceil(random()*2)]
FROM generate_series(1, 100_000) AS t(n);
INSERT 0 100000

db=# ANALYZE users;
ANALYZE

You now have 100K users in the system.

Honest Mistakes

Now you want to let your analysts access this table in their reporting tool of choice. You give one of the analysts permission, and this is the first query they write:

db=# SELECT * FROM users WHERE plan = 'Pro';
 id  username  plan
────┼──────────┼──────
(0 rows)

The query returned no results, and the analyst is baffled. How come there are no users on the "Pro" plan?

The name of the plan is "pro" and not "Pro" (with a capital "P") as the analyst wrote it. This is an honest mistake really, anyone can make such a mistake! But what is the cost of this mistake?

Examine the execution plan of a query for a non-existing value:

db=# EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro';
                QUERY PLAN
──────────────────────────────────────────────────────────────────
 Seq Scan on users  (cost=0.00..2185.00 rows=1 width=45)
                    (actual time=7.406..7.407 rows=0.00 loops=1)
   Filter: (plan = 'Pro'::text)
   Rows Removed by Filter: 100000
   Buffers: shared hit=935
 Planning:
   Buffers: shared hit=29 read=2
 Planning Time: 4.564 ms
 Execution Time: 7.436 ms

PostgreSQL scanned the entire table! However, there's a check constraint on the field - no row can ever have the value "Pro", the database makes sure of that! So if this condition always evaluates to false, why is PostgreSQL scanning the table?

Using constraint_exclusion

PostgreSQL is smart enough to skip a table scan when the query contains a condition that always evaluates to false, but not by default! To instruct PostgreSQL to look at constraints when generating a plan, you need to set the parameter constraint_exclusion:

db=# SET constraint_exclusion to 'on';
SET

db=# EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro';
                                      QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.000..0.001 rows=0.00 loops=1)
   One-Time Filter: false
 Planning:
   Buffers: shared hit=5 read=4
 Planning Time: 5.760 ms
 Execution Time: 0.008 ms
(6 rows)

Nice! After turning constraint_exclusion on, PostgreSQL figured out based on the check constraint that the condition won't return any rows, and skipped the scan entirely.

When constraint_exclusion Makes Sense

So who are you constraint_exclusion and why are you not on by default?

Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning via inheritance trees. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries.

The parameter constraint_exclusion is set to "partition" by default, where it's used to eliminate entire partitions when querying against a partitioned table - this is known as "partition pruning".

The documentation states that for simple queries the cost of evaluating all relevant conditions against all the relevant constraints might outweigh the benefit - you might end up spending more time planning than actually executing the query. It makes sense that queries executed by a system are less likely to query for invalid values or apply conditions that go against constraints. However, this is not the case for ad-hoc queries in reporting tools...

In BI and reporting environments users can issue complicated queries that are often crafted by hand. In this type of environment, it's not unlikely that they'll make mistakes, just like our analyst did before. Setting constraint_exclusion to "on" in reporting and data warehouse environments where users can issue ad-hoc queries can potentially save time and resources by eliminating unnecessary full table scans.


Imagine you have a sales table that looks like this:

db=# CREATE TABLE sale (
    id INT PRIMARY KEY,
    sold_at TIMESTAMPTZ NOT NULL,
    charged INT NOT NULL
);
CREATE TABLE

You keep track of when the sale was made and how much was charged. Create 10 million sales and analyze the table:

db=# INSERT INTO sale (id, sold_at, charged)
SELECT
    n AS id,
    '2025-01-01 UTC'::timestamptz + (interval '5 seconds') * n AS sold_at,
    ceil(random() * 100) AS charged
FROM generate_series(1, 10_000_000) AS t(n);
INSERT 0 10000000

db=# ANALYZE sale;
ANALYZE

Slapping a B-Tree on it

Your analysts often produce daily sales reports and their queries can look roughly like this:

db=# EXPLAIN (ANALYZE ON, BUFFERS OFF, COSTS OFF)
SELECT date_trunc('day', sold_at AT TIME ZONE 'UTC'), SUM(charged)
FROM sale
WHERE '2025-01-01 UTC' <= sold_at AND sold_at < '2025-02-01 UTC'
GROUP BY 1;
                                QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────
 HashAggregate (actual time=626.074..626.310 rows=32.00 loops=1)
   Group Key: date_trunc('day'::text, sold_at)
   Batches: 1  Memory Usage: 2081kB
   ->  Seq Scan on sale (actual time=6.428..578.135 rows=535679.00 loops=1)
         Filter: (('2025-01-01 02:00:00+02'::timestamp with time zone <= sold_at)
                  AND (sold_at < '2025-02-01 02:00:00+02'::timestamp with time zone))
         Rows Removed by Filter: 9464321
 Planning Time: 0.115 ms
 Execution Time: 627.119 ms

PostgreSQL scanned the entire table and the query completed in ~627ms. Your analysts are a bit spoiled and ~600ms is too slow for them, so you do what you always do in these cases and "slap a B-Tree index on it":

db=# CREATE INDEX sale_sold_at_ix ON sale(sold_at);
CREATE INDEX

Execute the query with the index:

db=# EXPLAIN (ANALYZE ON, BUFFERS OFF, COSTS OFF)
SELECT date_trunc('day', sold_at AT TIME ZONE 'UTC'), SUM(charged)
FROM sale
WHERE '2025-01-01 UTC' <= sold_at AND sold_at < '2025-02-01 UTC'
GROUP BY 1;
                                    QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────
HashAggregate (actual time=186.970..187.212 rows=32.00 loops=1)
  Group Key: date_trunc('day'::text, sold_at)
  Batches: 1  Memory Usage: 2081kB
  ->  Index Scan using sale_sold_at_ix on sale (actual time=0.038..137.067 rows=535679.00 loops=1)
        Index Cond: ((sold_at >= '2025-01-01 02:00:00+02'::timestamp with time zone)
                     AND (sold_at < '2025-02-01 02:00:00+02'::timestamp with time zone))
        Index Searches: 1
Planning Time: 0.261 ms
Execution Time: 187.363 ms

Execution time reduced from ~627ms to 187ms and the analysts are happy, but at what cost?

db=# \di+ sale_sold_at_ix
List of indexes
[ RECORD 1 ]─┬────────────────
Schema         public
Name           sale_sold_at_ix
Type           index
Owner          haki
Table          sale
Persistence    permanent
Access method  btree
Size           214 MB
Description    ¤

The index is 214 MB! That's almost half the size of the entire table. So the analysts are happy, but you? Not so much...

Slapping a B-Tree index is very common, but DBAs and developers often ignore the storage cost and the maintenance burden that comes with it. Using simple measures, we can potentially save some space and money.

Rethinking the Problem

Let's step back and re-think what we were trying to optimize. Analysts wanted to produce daily reports, but we provided them with an index that can produce results at a millisecond precision. By indexing the date and the time, we gave the analysts a lot more than what they asked for!

What if instead of indexing the entire datetime, we index just the date, without the time?

db=# CREATE INDEX sale_sold_at_date_ix ON sale((date_trunc('day', sold_at AT TIME ZONE 'UTC'))::date);
CREATE INDEX

This creates a function-based index on the date part of the sale date. We make sure to set the time zone before we truncate the date to match the one used by the analysts in their reports.

First, check the size of the indexes:

db=# \di+ sale_sold_at_*
                  List of indexes
         Name          Table  Access method   Size
──────────────────────┼───────┼───────────────┼────────
 sale_sold_at_date_ix  sale   btree          66 MB
 sale_sold_at_ix       sale   btree          214 MB

The function-based index is just 66MB, that's more than 3 times smaller than the full index. While a date is smaller than a timestamptz -- 4 bytes vs. 8 bytes -- this is actually not where the majority of the savings come from. The function-based index has fewer distinct values, so PostgreSQL can optimize its size using deduplication.

To check if we can make better use of the smaller index, we start by dropping the full index:

db=# DROP INDEX sale_sold_at_ix;
DROP INDEX

To allow our query to use the function-based index we make some adjustments to the query (we'll tackle that later!):

db=# EXPLAIN (ANALYZE ON, BUFFERS OFF, COSTS OFF)
SELECT date_trunc('day', sold_at AT TIME ZONE 'UTC'), SUM(charged)
FROM sale
WHERE date_trunc('day', sold_at AT TIME ZONE 'UTC')::date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY 1;

                                                                  QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 GroupAggregate (actual time=6.499..145.889 rows=31.00 loops=1)
   Group Key: date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text))
   ->  Index Scan using sale_sold_at_date_ix on sale (actual time=0.015..119.832 rows=535679.00 loops=1)
         Index Cond: ((date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text))::date >= '2025-01-01 00:00:00'::timestamp without time zone)
                  AND (date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text))::date < '2025-02-01 00:00:00'::timestamp without time zone))
         Index Searches: 1
 Planning Time: 0.151 ms
 Execution Time: 145.913 ms

The index was used and the query completed in just 145ms, that's ~20ms faster then using the full index, and x4.5 time faster than the full table scan.

The Discipline Problem

Using a function based index can be fragile. If we make even the slightest adjustment to the expression, the database won't be able to use the index:

db=# EXPLAIN (ANALYZE OFF, COSTS OFF)
SELECT (sold_at AT TIME ZONE 'UTC')::date, SUM(charged)
FROM sale
WHERE (sold_at AT TIME ZONE 'UTC')::date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY 1;
                                QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────
 HashAggregate
   Group Key: ((sold_at AT TIME ZONE 'UTC'::text))::date
   ->  Seq Scan on sale
         Filter: ((((sold_at AT TIME ZONE 'UTC'::text))::date >= '2025-01-01'::date)
         AND (((sold_at AT TIME ZONE 'UTC'::text))::date <= '2025-01-31'::date))

The query is the same as the previous, but we changed the expression from using date_trunc to using ::date, so the database was unable to use the function-based index.

Using the exact same expression requires a certain level of discipline that doesn't realistically exist in any organization. It's borderline naive to expect this to be useful this way - we need to come up with a way to force the use of this exact expression.

The old way of doing this involved a view:

db=# CREATE VIEW v_sale AS
SELECT *, date_trunc('day', sold_at AT TIME ZONE 'UTC')::date AS sold_at_date
FROM sale;
CREATE VIEW

The view adds a new calculated column called "sold_at_date" that uses the exact same expression we used to define the function-based index. Using the view, we can guarantee that we allow the database to use the index:

db=# EXPLAIN (ANALYZE OFF, COSTS OFF)
SELECT sold_at_date, SUM(charged)
FROM v_sale
WHERE sold_at_date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY 1;
                                          QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 GroupAggregate
   Group Key: (date_trunc('day'::text, (sale.sold_at AT TIME ZONE 'UTC'::text)))::date
   ->  Index Scan using sale_sold_at_date_ix on sale
         Index Cond: (((date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text)))::date >= '2025-01-01'::date)
         AND ((date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text)))::date <= '2025-01-31'::date))

The index is used and the query is fast! Cool, but...

Views are definitely a viable solution here, but they suffer from the same discipline problem - the analysts can still use the table directly (and they will!). We can revoke access from the table or do some magic tricks with search_path to fool them into using the view, but there is an easier way.

Using Virtual Generated Columns

Starting at version 14, PostgreSQL supports generated columns - these are columns that are automatically populated with an expression when we insert the row. Sounds exactly like what we need but there is a caveat - the result of the expression is materialized - this means additional storage, which is what we were trying to save in the first place!

Lucky for us, starting with version 18, PostgreSQL supports virtual generated columns. A virtual column looks like a regular column, but it's actually an expression that's being evaluated every time it is accessed. Basically, what we tried to achieve before with a view!

First, add a virtual generated column to the table with the same expression we indexed:

db=# ALTER TABLE sale ADD sold_at_date DATE
GENERATED ALWAYS AS (date_trunc('day', sold_at AT TIME ZONE 'UTC'));
ALTER TABLE

Next, execute the query using the virtual generated column:

db=# EXPLAIN (ANALYZE ON, COSTS OFF, BUFFERS OFF)
SELECT sold_at_date, SUM(charged)
FROM sale
WHERE sold_at_date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY 1;
                                          QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 GroupAggregate (actual time=7.047..162.965 rows=31.00 loops=1)
   Group Key: (date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text)))::date
   ->  Index Scan using sale_sold_at_date_ix on sale (actual time=0.015..134.795 rows=535679.00 loops=1)
         Index Cond: (((date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text)))::date >= '2025-01-01'::date)
         AND ((date_trunc('day'::text, (sold_at AT TIME ZONE 'UTC'::text)))::date <= '2025-01-31'::date))
         Index Searches: 1
 Planning Time: 0.128 ms
 Execution Time: 162.989 ms

Using the virtual generated column we can make sure the expression used in the query is the exact same expression we indexed. PostgreSQL is then able to use the index, and the query is fast.

There are several advantages to this approach:

  1. Smaller index: fewer distinct values means the database can use deduplication to make the index smaller.

  2. Faster query: the small and specific index requires less resources so the query is faster.

  3. No discipline: using the generated column is straight forward and the index is guaranteed to be useable.

  4. No ambiguity: making sure anyone on the team uses the same exact expression is prone to errors and discrepancies, especially when time zones are involved. Using a virtual generated column eliminates this ambiguity.

Indexing Virtual Generated Columns

The next logical step would be to create the index directly on the virtual column. Unfortunately, as of writing this article, PostgreSQL 18 does not support indexes on virtual generated columns:

db=# CREATE INDEX sale_sold_at_date_ix ON sale(sold_at_date);
ERROR:  indexes on virtual generated columns are not supported

Hopefully indexes on virtual generated columns will make it to PostgreSQL 19.


Imagine you have a system that extracts information from URLs. You create a table to keep track:

CREATE TABLE urls (
    id INT PRIMARY KEY,
    url TEXT NOT NULL,
    data JSON
);

Create some entries:

db=# INSERT INTO urls (id, url)
SELECT n, 'https://' || uuidv4() || '.com/ ' || uuidv4() || '?p=' || uuidv4()
FROM generate_series(1, 1_000_000) AS t(n);
INSERT 0 1000000

Processing web pages can be resource intensive, time consuming and expensive, so you want to make sure you don't process the same page more than once.

Slap a Unique B-Tree on it

To make sure URLs are not processed more than once you add a unique constraint on the url column:

db=# CREATE UNIQUE INDEX urls_url_unique_ix ON urls(url);

You can now rest assured that you don't process the exact URL more than once:

db=# INSERT INTO urls(id, url) VALUES (1_000_001, 'https://hakibenita.com');
INSERT 0 1

db=# INSERT INTO urls(id, url) VALUES (1_000_002, 'https://hakibenita.com');
ERROR:  duplicate key value violates unique constraint "urls_url_unique_ix"
DETAIL:  Key (url)=(https://hakibenita.com) already exists.

The unique constraint is enforced using a unique B-Tree index, so you also get the nice perk of being able to search for a specific URL very quickly:

db=# EXPLAIN (ANALYZE ON, BUFFERS OFF, COSTS OFF)
SELECT * FROM urls WHERE url = 'https://hakibenita.com';
                                        QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────
 Index Scan using urls_url_unique_ix on urls (actual time=0.018..0.018 rows=1.00 loops=1)
   Index Cond: (url = 'https://hakibenita.com'::text)
   Index Searches: 1
 Planning Time: 0.173 ms
 Execution Time: 0.046 ms

Web pages these days can have pretty big URLs. Some web apps even go as far as storing the entire application state in the URL. This is great for users, but not so great if you need to store these URLs.

Check the size of the table and the B-Tree index used to enforce the unique constraint:

db=# \dt+ urls
List of tables
[ RECORD 1 ]─┬──────────
Schema         public
Name           urls
Type           table
Owner          haki
Persistence    permanent
Access method  heap
Size           160 MB
Description    ¤

db=# \di+ urls_url_unique_ix
List of indexes
[ RECORD 1 ]─┬───────────────────
Schema         public
Name           urls_url_unique_ix
Type           index
Owner          haki
Table          urls
Persistence    permanent
Access method  btree
Size           154 MB
Description    ¤

The size of the table is 160MB and the size of the index is a staggering 154MB!

Unique Hash Index

A B-Tree index stores the indexed values themselves in the leaf blocks, so when indexing large values, the B-Tree index can get very large.

PostgreSQL offers another type of index called a Hash index. This type of index does not store the actual values. Instead, it stores the hash values which can be much smaller. I wrote about Hash indexes in the past so I wont repeat myself. I would just say that indexing large values with very few repetition is where the Hash index truly shines!

A Hash index sounds like a reasonable way to enforce a unique constraint, so let's try to create a unique hash index:

db=# CREATE UNIQUE INDEX urls_url_unique_hash ON urls USING HASH(url);
ERROR:  access method "hash" does not support unique indexes

Oh no! PostgreSQL does not support unique hash indexes, but this doesn't mean we can't still enforce uniqueness using a Hash index...

Enforcing Uniqueness Using a Hash Index

PostgreSQL offers a special type of constraint called an exclusion constraint. This lesser-known and not-so-widely-used constraint is often mentioned in combination with a GIN or GiST index as a way to prevent overlapping ranges. However, using an exclusion constraint we can effectively enforce uniqueness using a Hash index:

db=# ALTER TABLE urls ADD CONSTRAINT urls_url_unique_hash EXCLUDE USING HASH (url WITH =);
ALTER TABLE

This adds an exclusion constraint on the table that prevents two rows with the same URL - this guarantees uniqueness. The exclusion constraint is enforced using a Hash index - this means we effectively enforce uniqueness with a Hash index!

First, verify that uniqueness is indeed enforced:

db=# INSERT INTO urls (id, url) VALUES (1_000_002, 'https://hakbenita.com/postgresql-hash-index');
INSERT 0 1

db=# INSERT INTO urls (id, url) VALUES (1_000_003, 'https://hakbenita.com/postgresql-hash-index');
ERROR:  conflicting key value violates exclusion constraint "urls_url_unique_hash"
DETAIL:  Key (url)=(https://hakbenita.com/postgresql-hash-index) conflicts with
existing key (url)=(https://hakbenita.com/postgresql-hash-index).

Attempting to add a row with a URL that already exists failed with an exclusion constraint violation. Good.

Next, can this Hash index be useful for queries that filter for specific urls?

db=# EXPLAIN (ANALYZE ON, BUFFERS OFF, COSTS OFF)
SELECT * FROM urls WHERE url = 'https://hakibenita.com';
                                         QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────
 Index Scan using urls_url_unique_hash on urls (actual time=0.010..0.011 rows=1.00 loops=1)
   Index Cond: (url = 'https://hakibenita.com'::text)
   Index Searches: 1
 Planning Time: 0.178 ms
 Execution Time: 0.022 ms

Yes it can, and in this case it's even faster than using the B-Tree index (0.022ms vs 0.046ms).

Finally, compare the size of the B-Tree and the Hash index:

db=# \di+ urls_url_*
                List of indexes
        Name          Access method   Size
─────────────────────┼───────────────┼────────
urls_url_unique_hash  hash           32 MB
urls_url_unique_ix    btree          154 MB

Amazing! The Hash index is x5 smaller than the corresponding B-Tree index. Instead of storing those large URLs in the B-Tree leaf blocks, the Hash index stores only the hash values which results in a significantly smaller index.

Limitation of "Unique" Exclusion Constraints

Using an exclusion constraint to enforce uniqueness with a Hash index can potentially save storage and make queries faster. However, there are a few caveats to consider with this approach:

⚠️ Column cannot be referenced by foreign keys

PostgreSQL requires that a foreign key reference a unique constraint. Since we can't define a unique hash constraint, we can't point a foreign key to it:

db=# CREATE TABLE foo (url TEXT REFERENCES urls(url));
ERROR:  there is no unique constraint matching given keys for referenced table "urls"

⚠️ Limitations on INSERT ... ON CONFLICT

The ON CONFLICT clause in an INSERT command is common and very useful for syncing data. Unfortunately, using exclusion constraints with this clause can have some rough edges.

Attempting to use an exclusion constraint with a list of fields in an ON CONFLICT ... DO NOTHING clause can fail:

db=# INSERT INTO urls (id, url) VALUES (1_000_004, 'https://hakibenita.com')
ON CONFLICT (url) DO NOTHING;
ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification

The message suggests that it should be possible to use an exclusion, and it is, but using the ON CONFLICT ON CONSTRAINT clause instead:

db=# INSERT INTO urls (id, url) VALUES (1_000_004, 'https://hakibenita.com')
ON CONFLICT ON CONSTRAINT urls_url_unique_hash DO NOTHING;
INSERT 0 0

Trying the same with ON CONFLICT ... DO UPDATE is not possible at all, even when using ON CONFLICT ON CONSTRAINT:

db=# INSERT INTO urls (id, url) VALUES (1_000_004, 'https://hakibenita.com')
ON CONFLICT ON CONSTRAINT urls_url_unique_hash DO UPDATE SET id = EXCLUDED.id;
ERROR:  ON CONFLICT DO UPDATE not supported with exclusion constraints

I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'de use MERGE instead:

db=# MERGE INTO urls t
USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url)
ON t.url = s.url
WHEN MATCHED THEN UPDATE SET id = s.id
WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url);
MERGE 1

Finally, check the execution plan to verify that the statement is capable of using the Hash index:

                                  QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────
Merge on urls t  (cost=0.00..8.04 rows=0 width=0)
  ->  Nested Loop Left Join  (cost=0.00..8.04 rows=1 width=6)
      ->  Result  (cost=0.00..0.01 rows=1 width=0)
      ->  Index Scan using urls_url_unique_hash on urls t  (cost=0.00..8.02 rows=1 width=6)
          Index Cond: (url = 'https://hakibenita.com'::text)

It can, and it did!

Despite these minor limitations and inconveniences, a Hash index is a good candidate for enforcing uniqueness of large values that don't need to be referenced by foreign keys.

联系我们 contact @ memedata.com