继续，自行托管Postgres

继续，自行托管Postgres
Go ahead, self-host Postgres

原始链接: https://pierce.dev/notes/go-ahead-self-host-postgres#user-content-fn-1

## 重新思考数据库自托管多年来，云服务提供商一直在宣传自托管数据库过于复杂且不可靠。然而，这种说法忽略了一个事实，许多托管数据库服务都是基于Postgres等开源工具构建的，并且成本大幅提高。作者认为自托管实际上是可以管理的，成功运行了一个Postgres数据库，每天为数千用户提供服务，问题很少——仅在两年内出现过30分钟的压力。历史上，自托管是常态，直到2015年左右数据库即服务（DBaaS）的兴起，将其宣传为消除“非差异化繁重的工作”。如今，DBaaS定价正在上涨，通常超过具有卓越规格的专用服务器的成本。虽然DBaaS提供备份和监控等运营优势，但核心数据库引擎仍然相同，并且由于可定制的配置，性能甚至可以通过自托管来*提高*。自托管需要一些运营开销——每月大约30-60分钟用于监控、更新和规划——这与在托管服务中解决问题所需的时间相当。对于大多数人来说，这是一个可行的选择，除了那些优先考虑速度的初创公司，或需要专门数据库工程团队的超大型企业，以及那些具有严格监管要求的公司。通过仔细配置，尤其是在内存和存储方面，自托管可以提供一种经济高效且性能良好的替代方案。

## Hacker News 讨论：自托管 Postgres 一篇最近的 Hacker News 文章引发了关于自托管 PostgreSQL 的优缺点讨论。原文鼓励个人管理自己的 Postgres 实例。评论者普遍认为自托管是一个稳固的技术决策，在初始设置后能提供稳定性——特别是使用发行版提供的软件包，如 Debian 的 LTS 版本。然而，一些人警告初创公司不要这样做，认为配置和维护所需的时间投入超过了潜在的成本节省。一个关键点是自托管带来的责任；宕机需要立即关注，一些公司更愿意通过使用托管云服务来避免这种情况。其他人指出，云服务提供商的吸引力在于可以在宕机时转移责任。讨论还涉及将 Postgres 集成到各种应用程序中的趋势，即使在可能更合适的替代方案存在的情况下。

原文

Self-hosting a database sounds terrifying. That narrative has certainly been pushed over the last 10 years by the big cloud providers:

Hosting your own database is dangerous
How are you going to get all the 9s of reliability doing it yourself?
You'll have access to dedicated database engineers that you couldn't (or wouldn't want to) hire yourself

The rumors obscure the truth.

Most cloud hosts are just running a slightly modified version of the open source Postgres server anyway¹
Database engineering is not a silver bullet if your queries are sub-optimal. Abstracting away your engine too much from your code doesn't let you benchmark what's going on and work around the otherwise reasonable constraints of how the engine is actually querying your code.

I've had data corruption when using a 3rd party vendor just the same as I've had when self-hosting. And with a serious markup, what's the point?

I've been running my own self-hosted postgres for the better part of two years now, serving thousands of users and tens of millions of queries daily². I expected it would give me much more trouble than it has. It's caused me exactly 30mins of stress during a manual migration and that's all. Aside from that it's been fast, stable, and much cheaper.

I sleep just fine at night thank you.

Blue skies and the white cloud

Let me rewind for a second. The "database as a service" narrative wasn't always the dominant one. From the 80s to the early 2000s, everyone ran their own databases because there wasn't really an alternative. You had your application server and your database server, often on the same physical machine. It was pretty dang fast³ because it was communicating over localhost before forwarding the final payload over the network.

Amazon launched RDS in 2009. The pitch was compelling: we'll handle backups, patching, high availability, and monitoring. You just connect and go. The early pricing was reasonable too - a small RDS instance cost about the same as a dedicated server, but with less operational overhead. If you had to scale your database specs independent from your web service, it made some sense.

The real shift happened around 2015 when cloud adoption accelerated. Companies started to view any infrastructure management as "undifferentiated heavy lifting"⁴. Running your own database became associated with legacy thinking. The new orthodoxy emerged: focus on your application logic, let AWS handle the infrastructure.

Fast forward to 2025 and I hope the pendulum might be swinging back. RDS pricing has grown considerably more aggressive. A db.r6g.xlarge instance (4 vCPUs, 32GB RAM) now costs $328/month before you add storage, backups, or multi-AZ deployment. For that price, you could rent a dedicated server with 32 cores and 256GB of RAM.

Unpacking the cloud

For the most part managed database services aren't running some magical proprietary technology. They're just running the same open-source Postgres you can download with some operational tooling wrapped around it.

Take AWS RDS. Under the hood, it's:

Standard Postgres compiled with some AWS-specific monitoring hooks
A custom backup system using EBS snapshots
Automated configuration management via Chef/Puppet/Ansible
Load balancers and connection pooling (PgBouncer)
Monitoring integration with CloudWatch
Automated failover scripting

None of this is technically complex. The value proposition is operational: they handle the monitoring, alerting, backup verification, and incident response. It's also a production ready configuration at minute zero of your first deployment. But the actual database engine? It's the same Postgres running the same SQL queries with the same performance characteristics.

I helped proved this to myself when I migrated off RDS. I took a pg_dump of my RDS instance, restored it to a self-hosted server with identical specs, and ran my application's test suite. Performance was identical. In some cases, it was actually better because I could tune parameters that RDS locks down.

My self-hosting journey

I spent a weekend migrating to a dedicated server from DigitalOcean⁵: 16 vCPU / 32GB Memory / 400GB disk. The migration took about 4 hours of actual work:

Hour 1: Provision server, install Postgres, configure basic settings (kamal makes this pretty easy)
Hour 2: Set up monitoring (Prometheus + Grafana), configure backups
Hour 3: Migrate data using pg_dump and pg_restore
Hour 4: Update application connection strings, verify everything works

The performance improvement was immediate. Query latency dropped by about 20% across the board. Why? Because I could tune the configuration for my specific workload instead of using RDS's conservative defaults.

The real operational complexity

For my companies with high availability requirements, this stack takes me about half an hour per month. For the stacks that get less traffic I'm fully hands off - set it and forget it. This is roughly my cadence for my primary deploys:

Weekly tasks (10 minutes):

Check backup verification (automated, just reviewing alerts)
Review slow query logs
Check disk space trends

Monthly tasks (30 minutes):

Apply Postgres security updates
Review and rotate backup retention
Capacity planning based on growth trends

Quarterly tasks (optional) (2 hours):

Update monitoring dashboards
Review and optimize configuration parameters
Test disaster recovery procedures

As far as I'm concerned this is roughly comparable to the time you spend debugging RDS connection limits, working around parameter groups you can't modify, or dealing with surprise maintenance windows.

The main operational difference is that you're responsible for incident response. If your database goes down at 3 AM, you need to fix it. But here's the thing: RDS goes down too⁶. And when it does, you're still the one getting paged, you just have fewer tools to fix the problem.

For the most part I've found that unless I'm actively messing with the database, it's really stable. After all you're just renting some remote machine in a data center somewhere. All updates are up to you - so you have a good idea when to schedule the most risky windows.

When self-hosting makes sense

I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:

If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
Regulated workloads (PCI-DSS, FedRAMP, HIPAA, etc.) sometimes require a managed platform with signed BAAs or explicit compliance attestations.

The configuration deep dive

The things you really have to bear in mind when self-hosting:

Memory Configuration: This is where most people mess up. Pulling the standard postgres docker image won't cut it. You have to configure memory bounds with static limits that correspond to hardware. I've automated some of these configurations. But whether you do it manually or use some auto-config, tweaking these params is a must.

The key parameters:

shared_buffers: Start around 25 % of RAM; modern PG happily uses tens of GB.
effective_cache_size: Set to 75% of system RAM (this tells Postgres how much memory the OS will use for caching)
work_mem: Be conservative here. Set it to total RAM / max_connections / 2, or use a fixed value like 32MB
maintenance_work_mem: Can be generous (1-2GB), only used during VACUUM and index operations

Connection Management: RDS enforces their own max connections, but when self hosting you get the opportunity to choose your own:

# Connection settings
max_connections = 200
shared_preload_libraries = 'pg_stat_statements'
log_connections = on
log_disconnections = on

Wahoo! More connections = more parallelism right?

No such free lunch I'm afraid. Making fresh connections in postgres has pretty expensive overhead, so you almost always want to put a load balancer on front of it. I'm using pgbouncer on all my projects by default - even when load might not call for it. Python asyncio applications just work better with a centralized connection pooler.

And yes, I've automated some of the config there too.

Storage Tuning: NVMe SSDs change the performance equation completely:

# Storage optimization for NVMe
random_page_cost = 1.1                 # Down from default 4.0
seq_page_cost = 1.0                    # Keep at default
effective_io_concurrency = 200         # Up from default 1

These settings tell Postgres that random reads are almost as fast as sequential reads on NVMe drives, which dramatically improves query planning.

WAL Configuration: Write-Ahead Logging is critical for durability and performance:

# WAL settings
wal_level = replica                     # Enable streaming replication
max_wal_size = 2GB                     # Allow larger checkpoints
min_wal_size = 1GB                     # Prevent excessive recycling
checkpoint_completion_target = 0.9      # Spread checkpoint I/O over 90% of interval

Conclusion

I'm not advocating that everyone should self-host everything. But the pendulum has swung too far toward managed services. There's a large sweet spot where self-hosting makes perfect sense, and more teams should seriously consider it.

Start small. If you're paying more than $200/month for RDS, spin up a test server and migrate a non-critical database. You might be surprised by how straightforward it is.

The future of infrastructure is almost certainly more hybrid than it's been recently: managed services where they add genuine value, self-hosted where they're just expensive abstractions. Postgres often falls into the latter category.