RegreSQL：PostgreSQL 查询的回归测试

RegreSQL：PostgreSQL 查询的回归测试
RegreSQL: Regression Testing for PostgreSQL Queries

原始链接: https://boringsql.com/posts/regresql-testing-queries/

## RegreSQL：针对您的SQL的回归测试 RegreSQL将PostgreSQL强大的回归测试方法应用于应用程序SQL查询，从而在问题到达生产环境*之前*主动捕获正确性和性能问题。传统的测试往往忽略SQL，将其视为简单的“粘合代码”或依赖于不足的集成测试。RegreSQL通过将SQL视为字符串并系统地将其查询行为与预期结果进行验证来解决这一差距——模仿PostgreSQL本身进行测试的方式。该工具最初来自“PostgreSQL艺术”，允许您编写（或生成）SQL，提供输入数据，然后在模式修改或代码重构后自动检测破坏性更改。除了正确性之外，RegreSQL还会跟踪性能基线并标记潜在的回归，例如顺序扫描，从而提供对查询计划健康的洞察。主要功能包括基于YAML的声明式测试数据管理（fixtures），以及用于生成逼真数据量的生成器，以及用于CI/CD集成的输出格式化程序。虽然ORM抽象了SQL，但它们并不能消除测试生成的查询的必要性。RegreSQL旨在使开发人员能够自信地演进他们的数据库和SQL代码，并提供可靠的安全保障。该项目正在积极开发中，并欢迎社区贡献。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录注册 reSQL：PostgreSQL 查询的回归测试 (boringsql.com) 7 分，作者 radimm，50 分钟前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

TL;DR - RegreSQL brings PostgreSQL's regression testing methodology to your application queries, catching both correctness bugs and performance regressions before production.

As puzzling as it might seem, the common problem with production changes is the ever-present "AHA" moment when things start slowing down or crashing straight away. Testing isn't easy as it is, but there's a widespread practice gap when it comes to testing SQL queries. Some might pretend to "fix it" by using ORMs to abstract away the problem. Others treat SQL as "just glue code" that doesn't deserve systematic testing. Most settle for integration tests that verify the application layer works, never actually testing whether their queries will survive the next schema change or index modification.

For PostgreSQL development itself, the project has a robust regression test suite that has been preventing disasters in core development for decades. The database itself knows how to test SQL systematically - we just don't use those same techniques for our own queries. Enter RegreSQL, a tool originally created by Dimitri Fontaine for The Art of PostgreSQL book (which is excellent for understanding and mastering PostgreSQL as a database system), designed to bring the same regression testing framework to our application queries.

I've been trying to use it for some time, but due to missing features and limitations gave up several times. Until now. I decided to fork the project and spend the time needed to take it to the next level.

Introduction

The RegreSQL promise starts with the biggest strength and perceived weakness of SQL queries. They are just strings. And unless you use something like sqlc (for Go), PG'OCaml or Rust's SQLx toolkit giving you compile-time checking, your queries are validated only when they are executed. Which in better case mean either usually slow-ish test suite or integration tests, in worst scenario only when deployed. ORMs are another possibility - completely abstracting away SQL (but more on that later).

But even with compile-time checking, you are only checking for one class of problems: schema mismatches. What about behavior changes after schema migration or performance regressions? What about understanding whether your optimization actually made things faster or just moved the problem elsewhere?

This is where RegreSQL comes in. Rather than trying to turn SQL into something else, RegreSQL embraces "SQL as strings" reality and applies the same testing methodology PostgreSQL itself uses: regression testing. You write (or generate - continue reading) your SQL queries, provide input data, and RegreSQL verifies that future changes don't break those expectations.

The features don't stop there though - it tracks performance baselines, detects common query plan regressions (like sequential scans), and gives you framework for systematic experimentation with the schema changes and query change management.

Basic regression testing

Enough with theory. Let's jump in straight into the action and see what a sample run of RegreSQL looks like

$ regresql text
Connecting to 'postgres://radim:[email protected]/cdstore_test'… ✓

Running regression tests...

✓ album-by-artist_list-albums-by-artist.1.json (0.00s)
✓ album-by-artist_list-albums-by-artist.2.json (0.00s)

✓ album-tracks_list-tracks-by-albumid.2.json (0.00s)
✓ album-tracks_list-tracks-by-albumid.1.json (0.00s)

✓ artist_top-artists-by-album.1.json (0.00s)

✓ genre-topn_genre-top-n.top-1.json (0.00s)
✓ genre-topn_genre-top-n.top-3.json (0.00s)

✓ genre-tracks_tracks-by-genre.json (0.00s)

Results: 8 passed, 0 failed, 8 skipped (0.00s)

In this example based on Chinook database (as used originally in The Art of PostgreSQL book), RegreSQL scans the current directory (or one provided by -C /path/to/project) for *.sql files and attempts to run all queries against the configured PostgreSQL connection.

The individual files can contain either single or multiple sql queries. Like following example

-- name: top-artists-by-album
-- Get the list of the N artists with the most albums
SELECT
    artist.name,
    count(*) AS albums
FROM
    artist
    LEFT JOIN album USING (artist_id)
GROUP BY
    artist.name
ORDER BY
    albums DESC
LIMIT :n;

The syntax for the queries supports both positional arguments (like $1 known from libpq library) or (preferred) psql style variable (:varname). The each identified query (not file) is then executed for 0..N times, based on number of predefined plans and verified to the expected results - validating the expected data matches the one returned. The support for SQL files handling is available separately with https://github.com/boringSQL/queries (Go version only for now).

This gives you what original RegreSQL tool has introduced - change your schema, refactor a query, run regresql test and see immediately what broke. The test suite now has ability to catch regressions before they are committed / shipped. The current version built on top of it, giving you better console formatter instead of TAP style output, as well as jUnit, JSON and GitHub actions formatters for better integration into your CI/CD pipelines.

Performance regression testing

Basic regression testing catches correctness issues - wrong results, broken queries, schema mismatches. But there's another class of production issues it misses. Performance regressions. No matter how unbelievable it might sound but queries get deployed without appropriate indexes, or they change over time. Simple fix - both for handwritten SQL or ORM code - can switch from milliseconds to seconds. You add index that helps one query, but tanks another. You modify conditionals and accidently force sequential scan of millions of rows. This is where it hurts.

RegreSQL addresses this by tracking performance baselines alongside correctness. Once baselines are generated

$ regresql baseline
Connecting to 'postgres://appuser:[email protected]/cdstore_test'… ✓
Creating baselines directory: regresql/baselines
Creating directory 'regresql/baselines'

Creating baselines for queries:

  ./
  Created baseline: album-by-artist_list-albums-by-artist.1.json
  Created baseline: album-by-artist_list-albums-by-artist.2.json
  Created baseline: album-tracks_list-tracks-by-albumid.1.json
  Created baseline: album-tracks_list-tracks-by-albumid.2.json
  Created baseline: artist_top-artists-by-album.1.json
  Created baseline: genre-topn_genre-top-n.top-1.json
  Created baseline: genre-topn_genre-top-n.top-3.json
  Created baseline: genre-tracks_tracks-by-genre.json

Baselines have been created successfully!
Baseline files are stored in: regresql/baselines

the test command not only tests the regressions to the captured times, but also detects the common bad patterns in query execution plans. For now it provides warnings for detection of sequential scans - both on their and/or with nested loops and multiple sort operations. I believe this alone might provide a valuable insights and reduce the mishaps in production. It's also a place where further development of RegreSQL will take place.

To demonstrate this, let's review the test output with the baselines.

Connecting to 'postgres://appuser:[email protected]/cdstore_test'… ✓

Running regression tests...

✓ album-by-artist_list-albums-by-artist.1.json (0.00s)
✓ album-by-artist_list-albums-by-artist.2.json (0.00s)
✓ album-by-artist_list-albums-by-artist.1.cost (22.09 <= 22.09 * 110%) (0.00s)
  ⚠️  Sequential scan detected on table 'artist'
    Suggestion: Consider adding an index if this table is large or this query is frequently executed
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans
✓ album-by-artist_list-albums-by-artist.2.cost (22.09 <= 22.09 * 110%) (0.00s)
  ⚠️  Sequential scan detected on table 'artist'
    Suggestion: Consider adding an index if this table is large or this query is frequently executed
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans

✓ album-tracks_list-tracks-by-albumid.1.json (0.00s)
✓ album-tracks_list-tracks-by-albumid.2.json (0.00s)
✓ album-tracks_list-tracks-by-albumid.1.cost (8.23 <= 8.23 * 110%) (0.00s)
✓ album-tracks_list-tracks-by-albumid.2.cost (8.23 <= 8.23 * 110%) (0.00s)

✓ artist_top-artists-by-album.1.json (0.00s)
✓ artist_top-artists-by-album.1.cost (35.70 <= 35.70 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: album, artist
    Suggestion: Review query and consider adding indexes on filtered/joined columns

✓ genre-topn_genre-top-n.top-1.json (0.00s)
✓ genre-topn_genre-top-n.top-3.json (0.00s)
✓ genre-topn_genre-top-n.top-1.cost (6610.59 <= 6610.59 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: genre, artist
    Suggestion: Review query and consider adding indexes on filtered/joined columns
  ⚠️  Multiple sort operations detected (2 sorts)
    Suggestion: Consider composite indexes for ORDER BY clauses to avoid sorting
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans
✓ genre-topn_genre-top-n.top-3.cost (6610.59 <= 6610.59 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: artist, genre
    Suggestion: Review query and consider adding indexes on filtered/joined columns
  ⚠️  Multiple sort operations detected (2 sorts)
    Suggestion: Consider composite indexes for ORDER BY clauses to avoid sorting
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans

✓ genre-tracks_tracks-by-genre.json (0.00s)
✓ genre-tracks_tracks-by-genre.cost (37.99 <= 37.99 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: genre, track
    Suggestion: Review query and consider adding indexes on filtered/joined columns

Results: 16 passed (0.00s)

As you can see, despite from not having baseline, RegreSQL is able to detect the basic bad patterns that should be addressed before queries can be considered "production ready".

In some cases, having the detection of sequential scans, or just tracking query costs baselines might be considered undesirable, which would lead to false positives. RegreSQL enables this to be addressed by query metadata as demonstrated below.

-- name: query_name
-- metadata: key1=value1, key2=value2
SELECT ...;

At this point RegreSQL recognizes

notest to skip the query testing altogether (not just cost tracking)
nobaseline to skip cost tracking
noseqscanwarn to keep cost tracking but disable sequential scan warnings
and difffloattolerance to cost failure threshold (default 10% at the moment).

-- name: query_name
-- regresql: notest, nobaseline
-- regresql: noseqscanwarn
-- regresql: difffloattolerance:0.25
-- query that can vary in cost by 20% without being considered a failure
SELECT ...;

ORM enters the room

ORMs abstract away SQL, but they still generate it - and that generated SQL can have performance problems you won't catch until production. Consider this common scenario: you start with a simple SQLAlchemy query that works fine, then months later add eager loading for related data:

orders = (
    session.query(Order)
    .filter(Order.user_id == user_id)
    .options(
        joinedload(Order.user),
        joinedload(Order.shipping_address),
        selectinload(Order.items)  # NEW: Load order items
    )
    .all()
)

That innocent selectinload(Order.items) generates a separate query - and without an index on order_items.order_id, it performs a sequential scan.

RegreSQL can catch this by intercepting ORM-generated SQL using SQLAlchemy's event system:

@event.listens_for(engine, "before_cursor_execute")
def capture_sql(conn, cursor, statement, *args):
    captured_queries.append(statement)

Run your ORM code, capture the SQL, save it as a .sql file, and test it with RegreSQL. The performance baseline testing will flag the missing index before it hits production. This is currently experimental, but ORM integration is a key area for RegreSQL's future development.

Test Data Management

Up until now we have covered how RegreSQL verifies query correctness and tracks performance regressions. But there's a critical prerequisite we've only skimmed through. Every regression test needs consistent, reproducible data. Change the data, change their cardinality, and your expected results become meaningless. Your performance baselines drift. Your tests become flaky.

Traditional approach to create test data might involve

Database dumps become unmanageable - 500MB files you can't review, can't understand, that break with every schema migration, and whose data becomes stale as production evolves. Which version of the dump are your tests even using?
SQL scripts might be better than dumps, but still imperative and hard to maintain. You end up with INSERT statements scattered across multiple files, managing foreign keys manually, and debugging constraint violations.
Factories in application code might work great for integration tests, but we're testing SQL directly. Do you really want to maintain parallel data generation in your application language just for SQL tests?
Shared test database is the synonym for classic "works on my machine" problem. State leaks between tests. Parallel execution becomes impossible. Debugging is a nightmare.

What we need is something that's declarative (what data, not how to insert it), reproducible (similar data every time), composable (build complex scenarios from simple pieces), and scalable (from 10 rows to 100,000).

This is where next improvement in RegreSQL's fixture system comes in. Think of it as infrastructure-as-code for your test data. You describe the data you need in YAML files, and RegreSQL handles the rest - dependencies, cleanup, foreign keys, and even realistic data generation at scale.

RegreSQL's fixture system lets you define test data in YAML files stored in regresql/fixtures/. Here's a simple example

  fixture: basic_users
  description: a handful of test users
  cleanup: rollback

  data:
    - table: users
      rows:
        - id: 1
          email: [email protected]
          name: Alice Anderson
          created_at: 2024-01-15
        - id: 2
          email: [email protected]
          name: Bob Builder
          created_at: 2024-02-20

To use this fixture in your tests, reference it in the query's plan file (regresql/plans/get-user.yaml) you can just reference the fixture

  fixtures:
    - basic_users

  "1":
    email: [email protected]

  "2":
    email: [email protected]

And when you run regresql test, the fixture is automatically loaded before the query executes, and cleaned up afterward. No manual setup scripts, no state leakage between tests. But it does not stop with static fixtures. When you want to test queries against realistic volumes you can use range of data generators including

sequences, random integer, decimal, string, uuid, email and name generators
date_between for generating random timestamps within a range
foreign key references to be able to reuse data from other table's fixtures
range to select value from predefined sources
Go template support

 fixture: realistic_orders
  generate:
    - table: customers
      count: 1000
      columns:
        id:
          generator: sequence
          start: 1
        email:
          generator: email
          domain: shop.example.com
        name:
          generator: name
          type: full
        created_at:
          generator: date_between
          start: "2023-01-01"
          end: "2024-12-31"

    - table: orders
      count: 5000
      columns:
        id:
          generator: sequence
          start: 1
        customer_id:
          generator: int
          min: 1
          max: 1000
        amount:
          generator: decimal
          min: 10.00
          max: 999.99
          precision: 2
        order_date:
          generator: date_between
          start: "2023-01-01"
          end: "2024-12-31"

This generates 1,000 customers and 5,000 orders with realistic-looking data - names, emails, dates, and amounts that feel production-like.

The fixtures are also stackable and can be build on top of each other. For example if you need to make sure users fixtures are created before orders fixtures, just declare the dependency (the already planned improvement is to include the support automatic foreign-key detection to avoid ID hard-coding). RegreSQL loads fixtures in dependency order and handles cleanup in reverse.

  fixture: orders_with_shipping
  depends_on:
    - basic_users

  data:
    - table: orders
      rows:
        - id: 101
          user_id: 1  # References Alice from basic_users
          total: 99.99
          status: shipped

Should the available options for fixtures (manual data or data generators) not be enough, you always have options to use good old SQL based data generation.

  fixture: mixed_setup
  description: Combine SQL with YAML and generated data
  cleanup: rollback

  # SQL executes first (either as file or inline)
  sql:
    - file: sql/setup_schema.sql
    - inline: "INSERT INTO config (key, value) VALUES ('version', '1.0');"

  # followed YAML data
  data:
    - table: users
      rows:
        - id: 1
          email: [email protected]

  # and finally generated data
  generate:
    - table: orders
      count: 100
      columns:
        id:
          generator: sequence
          start: 1
        user_id:
          generator: int
          min: 1
          max: 1

RegreSQL provides commands to inspect and validate your fixtures

  # List all available fixtures
  regresql fixtures list

  # Show fixture details and dependencies
  regresql fixtures show realistic_orders

  # Validate fixture definitions
  regresql fixtures validate

  # Show dependency graph
  regresql fixtures deps

  # Apply fixture manually (for debugging)
  regresql fixtures apply basic_users

The fixture system has been design to transforms test data from a maintenance burden into a documented, version-controlled process. Your YAML files become the single source of truth for what data your tests need, making it easy to understand test scenarios and maintain test data as the application evolves.

RegreSQL future

Introducing a new open source project is an ambitious goal, and RegreSQL is just starting up. Despite the fork being in works for almost 2 years. In coming weeks and months I plan further improvements, as well as better documentation and more tutorials. The project is maintained as part of my boringSQL brand, where it's vital component for building SQL Labs which (as I sincerely hope) will provide a foundation for its further development.

At the same time RegreSQL is an attempt to give back to welcoming PostgreSQL community, make developer user experience slightly better if possible and (just maybe) provide one more argument against the case that SQL queries are not testable.

RegreSQL is available at GitHub - feel free to open issue, or drop me email about the project at [email protected] or connect on LinkedIn.