(评论)
(comments)
原始链接: https://news.ycombinator.com/item?id=43270712
Hacker News 的讨论围绕着 Postgres 通过 Mooncake 扩展获得的新的分析能力展开。一位用户 bhouston 分享了使用标准 Postgres 进行大规模分析的困境,提到在一个 20 亿条记录的表上查询速度很慢,因此选择了 BigQuery,因为它速度快、灵活且经济高效。
其他人反驳道,带有 Timescale 或 Citus 等列式存储扩展的 Postgres 仍然有效,尤其是在较小的数据集上,而像 ClickHouse 这样的专用数据库在 BigQuery 的成本随着数据和查询量增长时可能更胜一筹。
pg_mooncake 的创建者强调了其开源特性,利用 DuckDB 进行矢量化查询执行,将数据以 Parquet 格式存储在 S3 上,并利用 Postgres 进行元数据管理。
讨论还涉及到平衡 Postgres 兼容性和性能的挑战,以及 ClickHouse 和 Snowflake 等替代方案,以及开源扩展背后的商业模式。用户质疑 Mooncake 的性能提升是否足以弥补其潜在的复杂性以及对 OLTP 工作负载的潜在影响。
We tried using Postgres with large analytics at my previous company https://threekit.com but it is an absolute pain. Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.
I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.
But then I tried using BigQuery on my https://web3dsurvey.com side project and I just recorded raw data and then wrote queries to do real-time aggregation (with a caching layer) in a large variety of ways. And it was near instant and easy and also very cheap.
So then I deployed this strategy over at Threekit.com and it also was easy and fast and cheap. Even more flexible queries than Web3DSurvey at basically no developer cost or infrastructure costs. No more need for aggregating cron-jobs or trying to decide ahead of time how users want to slice and dice the data. Real time, flexible queries on the fly via BigQuery is the way.
Also BigQuery bill for https://web3dsurvey.com is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data. Where as just running the cheapest Postgres SQL server on Google Cloud is like >$25/month and it is a slow one.
I would never go back to traditional SQL for analytics - it was hard, slow, expensive and inflexible. Worst of all worlds.
reply