(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39313623

以下是讨论中出现的一些要点: 1. 对于开发人员和初创公司来说,云计算通常是昂贵、混乱且耗时的。 设置 Rackspace 负载均衡器、编写脚本来读取文件、创建 Zabbix 监视器以及制作自定义 Ansible 控制模块可能需要投入大量精力和成本。 然而,一些人认为,花时间维护这些系统可以避免向供应商支付持续费用,从而节省成本。 2. 尽管 AWS 和 GCP 各有缺点,但公司倾向于同时使用它们。 虽然AWS提供了出色的开发者体验,但某些服务可能会很昂贵,而GCP提供的价格便宜,但开发者体验很差。 有些人选择利用两个提供商的混合模型。 3. 必须执行通常需要数年时间才能完善的手动配置步骤,并且存在数千个错误,这可能会阻止开发人员和初创公司执行必要的任务,例如配置分布式跟踪以调试日志。 同样,创建具有自动故障转移和备份功能的 PostgreSQL 集群涉及相当大的复杂性,并且需要专门致力于数据库操作的专业团队。 4. 使用供应商提供的数据库服务可以消除专门数据库团队处理配置和备份过程的需要,从而可能降低组织的运营费用。 与传统方法相比,供应商提供的服务通常以较低的总成本处理数据库管理和配置。 5. 从长远来看,花时间设计和维护架构而不是主要关注核心功能可以节省大量成本,尽管这可能对纯粹关注收入的初创公司没有吸引力。 6. 执行对企业至关重要的活动,特别是涉及数据库的活动,可能会带来停机风险,从而导致收入损失和错过里程碑。 在计算与选择特定提供商相关的成本时必须考虑这些因素,因为即使是轻微的停机也会造成重大的财务损失。 总体而言,选择最佳架构对公司来说可能具有挑战性,特别是考虑到可扩展性、耐用性、一致性、延迟、成本和法规遵从性等众多因素,因为每种选择都会产生显着的后果。 最终,在弹性计算场还是分片关系数据库之间进行选择最终取决于相关公司的具体要求,并且根据对其独特需求的深入了解做出合理的技术判断可以产生最佳结果。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Almost every infrastructure decision I endorse or regret (cep.dev)
1111 points by slyall 2 days ago | hide | past | favorite | 596 comments










> The markup cost of using RDS (or any managed database) is worth it.

Every so often I price out RDS to replace our colocated SQL Server cluster and it's so unrealistically expensive that I just have to laugh. It's absurdly far beyond what I'd be willing to pay. The markup is enough to pay for the colocation rack, the AWS Direct Connects, the servers, the SAN, the SQL Server licenses, the maintenance contracts, and a full-time in-house DBA.

https://calculator.aws/#/estimate?id=48b0bab00fe90c5e6de68d0...

Total 12 months cost: 547,441.85 USD

Once you get past the point where the markup can pay for one or more full-time employees, I think you should consider doing that instead of blindly paying more and more to scale RDS up. You're REALLY paying for it with RDS. At least re-evaluate the choices you made as a fledgling startup once you reach the scale where you're paying AWS "full time engineer" amounts of money.



Some orgs are looking at moving back to on prem because they're figuring this out. For a while it was vogue to go from capex to opex costs, and C suite people were incentivized to do that via comp structures, hence "digital transformation" ie: migration to public cloud infrastructure. Now, those same orgs are realizing that renting computers actually costs more than owning them, when you're utilizing them to a significant degree.

Just like any other asset.



Funny story time.

I was once part of an acquisition from a much larger corporate entity. The new parent company was in the middle of a huge cloud migration, and as part of our integration into their org, we were required to migrate our services to the cloud.

Our calculations said it would cost 3x as much to run our infra on the cloud.

We pushed back, and were greenlit on creating a hybrid architecture that allowed us to launch machines both on-prem and in the cloud (via a direct link to the cloud datacenter). This gave us the benefit of autoscaling our volatile services, while maintaining our predictable services on the cheap.

After I left, apparently my former team was strong-armed into migrating everything to the cloud.

A few years go by, and guess who reaches out on LinkedIn?

The parent org was curious how we built the hybrid infra, and wanted us to come back to do it again.

I didn't go back.



Yes, I do believe autoscaling is actually a good use case for public cloud. If you have bursty load that requires a lot of resources at peak which would sit idle most of the time, probably doesn't make sense to own what you need for those peaks.


My funny story is built on the idea that AWS is Hotel California for your data.

A customer had an interest in merging the data from an older account into a new one, just to simplify matters. Enterprise data. Going back years. Not even leaving the region.

The AWS rep in the meeting kinda pauses, says: "We'll get back to you on the cost to do that."

The sticker shock was enough that the customer simply inherited the old account, rather than making things tidy.



Is R2 a sensible option for hosting data? I understand egress is chesp.


R2 is great. Our GCS bill (almost all egress) jumped from a few hundred dollars a month to a couple thousand dollars a month last year due to a usage spike. We rush-migrated to R2 and now that part of the bill is $0.

I've heard some people here on HN say that it's slow, but I haven't noticed a difference. We're mainly dealing with multi-megabyte image files, so YMMV if you have a different workload.



awesome. I remember reading about this a while ago, but never tried. Since it has the same API i can imagine its not daunting as a multi-cloud infrastructure.

I guess permissions might be more complex, as in EC2 instance profiles wouldnt grant access, etc.



Eh? I've never had a problem moving data out of AWS.

Have people lost the ability to write export and backup scripts?



My (peripheral) experience is that it is much cheaper to get data in than to get data out. When you have the amount of data being discussed — "Enterprise data. Going back years." — that can get very costly.

It's the amount of data where it makes more sense to put hard drives on a truck and drive across the country rather than send it over a network, where this becomes an issue (actually, probably a bit before then).



AWS actually has a service for this - Snowmobile, a storage datacenter inside of a shipping container, which is driven to you on a semi truck. https://aws.amazon.com/snowmobile/


They do not!

> Q: Can I export data from AWS with Snowmobile? > > Snowmobile does not support data export. It is designed to let you quickly, easily, and more securely migrate exabytes of data to AWS. When you need to export data from AWS, you can use AWS Snowball Edge to quickly export up to 100TB per appliance and run multiple export jobs in parallel as necessary. Visit the Snowball Edge FAQs to learn more.

https://aws.amazon.com/snowmobile/faqs/?nc2=h_mo-lang

Why would they make it convenient to leave?



Oh, TIL! Thanks for correcting me.


That's only for data into AWS though, not data out


Just in network costs, there's a huge asymmetry. Uploading data to AWS is free. Downloading data from them, you have to pay.

When you have enough data, that cost is quite significant.



The ingress/egress cost is ridiculously high. Some companies don't care, but it is there and I've seen it catch people off guard multiple times.


Oh come on from the description both accounts could be sitting on the same datacenter LAN.


There's a cost for data egress (but not ingress)


It’s the cost of data egress, which isn’t free.


But there is no paid egress when we are moving data between account within one region, rigth?


There is. You pay a price for any cross-VPC traffic.


This isn't true, at least not anymore.

You can peer two vpc's and as long as you are transferring within the same (real) AZ, it's free: https://aws.amazon.com/about-aws/whats-new/2021/05/amazon-vp...

Even peered VPC's only pay "normal" prices: https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer

"Data transferred "in" to and "out" from Amazon EC2, Amazon RDS, Amazon Redshift, Amazon DynamoDB Accelerator (DAX), and Amazon ElastiCache instances, Elastic Network Interfaces or VPC Peering connections across Availability Zones in the same AWS Region is charged at $0.01/GB in each direction."



There are two possible scenarios here. Firstly, they can't find the talent to support what you implemented...or more likely, your docs suck!

I've made a career out of inheriting other peoples whacky setups and supporting them (as well as fixing them) and almost always its documentation that has prevented the client getting anywhere.

I personally dont care if the docs are crap because usually the first thing I do is update / actually write the docs to make them usable.

For a lot of techs though crap documentation is a deal breaker.

Crap docs aren't always the fault of the guys implementing though, sometimes there are time constraints that prevent proper docs being written. Quite frequently though its outsourced development agencies that refuse to write it because its "out of scope" and a "billable extra". Which I think is an egregious stance...doxs Should be part and parcel of the project. Mandatory.



I agree that bad documentation is a serious problem in many cases. So much so that your suggestion to write the documentation after the fact can become quite impossible.

If there is only one thing that juniors should learn about writing documentation (be it comments or design documents), it is this: document why something is there. If resources are limited, you can safely skip comments that describe how something works, because that information is also available in code.

(It might help to describe what is available, especially if code is spread out over multiple repositories, libraries, teams, etc.)

(Also, I suppose the comment I'm responding to could've been slightly more forgiving to GP, but that's another story.)



> Quite frequently though its outsourced development agencies that refuse to write it

It's also completely against their interest to write docs as it makes their replacement easier.

That's why you need someone competent on the buying side to insist on the docs.

A lot of companies outsource because they don't have this competency themselves. So it's inevitable that this sort of thing happens and companies get locked in and can't replace their contractors, because they don't have any docs.



Unfortunately it’s also possible that e.g the company switched from share point to confluence and lost half the entire knowledge base because it wasn’t labeled the way they thought it was. Or that the docs were all purged because they were part of an abandoned project.


Just to be clear, after I (and a few others left), they moved everything entirely to the cloud.

Even with documentation on the hybrid setup, they'd need to get a new on-prem environment up and running (find a colo, buy machines, set up the network, blah blah).



> the first thing I do is update / actually write the docs to make them usable.

OK so the docs are in sync for a single point of time when you finish. Plus you get to have the context in your head (bus factor of 1, job security for you, bad for the org.)

How about if we just write clean infra configs/code, stick to well known systems like docker, ansible, k8s, etc.

Then we can make this infra code available to an on prem LLM and ask it questions as needed without it drifting out of sync overtime as your docs surely will.

Wrong documentation is worse than no documentation.



"Crap docs aren't always the fault of the guys implementing though, sometimes there are time constraints that prevent proper docs being written."

I can always guarantee a stream of consciousness one note that should have most of the important data, and a few docs about the most important parts. It's up to management if they want me to spend time turning that one note into actual robust documentation that is easily read.



Documentation? What for? It's self-documenting (to me, because I wrote it)!


Context: I build internal tools and platforms. Traffic on them varies, but some of them are quite active.

My nasty little secret is for single server databases I have zero fear of over provisioning disk iops and running it on SQLite or making a single RDBMS server in a container. I've never actually run into an issue with this. It surprises me the number of internal tools I see that depend on large RDS installations that have piddly requirements.



>making a single RDBMS server in a container

On what disk is the actual data written? How do you do backups, if you do?



In most setups like this, it’s going to be spinning rust with mdadm, and MySQL dumps that get created via cron and sent to another location.


The problem with single instance is that while performance-wise it's best (at least on bare metal), there comes a moment when you simply have too much data and one machine can't handle. Your your scenario, it may never come up, but many organizations face this problem sooner or later.


I agree, my point is that clusters are overused. Most applications simply don't need them and it results in a lot of waste. Much of this has to do with engineers being tasked with an assortment of roles these days, so they obviously opt for the solution where a database and upgrades are managed for them. I've just found that managing a single containers upgrades aren't that big of an issue.


That’s made possible because of all the orchestration platforms such as Kubernetes being standardized, and as such you can get pretty close to a cloud experience while having all your infrastructure on-premise.


Yes, virtualization, overprovisioning and containerization have all played a role in allowing for efficient enough utilization of owned assets that the economics of cloud are perhaps no longer as attractive as they once were.


Same experience here. As a small organization, the quotes we got from cloud providers have always been prohibitively expensive compared to running things locally, even when we accounted for geographical redundancy, generous labor costs, etc. Plus, we get to keep know how and avoid lock-in, which are extremely important things in the long term.

Besides, running things locally can be refreshingly simple if you are just starting something and you don't need tons of extra stuff, which becomes accidental complexity between you, the problem, and a solution. This old post described that point quite well by comparing Unix to Taco Bell: http://widgetsandshit.com/teddziuba/2010/10/taco-bell-progra.... See HN discussion: https://news.ycombinator.com/item?id=10829512.

I am sure for some use-cases cloud services might be worth it, especially if you are a large organization and you get huge discounts. But I see lots of business types blindly advocating for clouds, without understanding costs and technical tradeoffs. Fortunately, the trend seems to be plateauing. I see an increasing demand for people with HPC, DB administration, and sysadmin skills.



> Plus, we get to keep know how and avoid lock-in, which are extremely important things in the long term.

So much this. The "keep know how" has been so greatly avoided over the past 10 years, I hope people with these skills start getting paid more as more companies realize the cost difference.



When I started working in the 1980s (as a teenager but getting paid) there was a sort of battle between the (genuinely cool and impressive) closed technology of IBM and the open world of open standards/interop like TCP/IP and Unix, SMTP, PCs, even Novell sort of, etc. There was a species of expert that knew the whole product offering of IBM, all the model numbers and recommended solution packages and so on. And the technology was good - I had an opportunity to program a 3093K(?) CM/VMS monster with APL and rexx and so on. Later on I had a job working with AS/400 and SNADS and token ring and all that, and it was interesting; thing is they couldn't keep up and the more open, less greedy, hobbyists and experts working on Linux and NFS and DNS etc. completely won the field. For decades, open source, open standards, and interoperability dominated and one could pick the best thing for each part of the technology stack, and be pretty sure that the resultant systems would be good. Now however, the Amazon cloud stacks are like IBM in the 1980s - amazingly high quality, but not open; the cloud architects master the arcane set of product offerings and can design a bespoke AWS "solution" to any problems. But where is the openness? Is this a pendulum that goes back and forth (and many IBM folks left IBM in the 1990s and built great open technologies on the internet) or was it a brief dawn of freedom that will be put down by the capital requirements of modern compute and networking stacks?

My money is on openness continuing to grow and more and more pieces of the stack being completely owned by openness (kernels anyone?) but one doesn't know.



Even without owning the infrastructure, running in the cloud without know-how is very dangerous.

I hear tell of a shop that was running on ephemeral instance based compute fleets (EC2 spot instances, iirc), with all their prod data in-memory. Guess what happened to their data when spot instance availability cratered due to an unusual demand spike? No more data, no more shop.

Don't even get me started on the number of privacy breaches because people don't know not to put customer information in public cloud storage buckets.



I was part of a relatively small org that wanted us to move to cloud dev machines. As soon as they saw the size of our existing development docker images that were 99.9% vendor tools in terms of disk space, they ran the numbers and told us that we were staying on-prem. I'm fairly sure just loading the dev images daily or weekly would be more expensive than just buying a server per employee.


Is there a bit of risk involved since the know-how has a will of its own and sometimes gets sick?

If I had a small business with very clever people I'd be very afraid of what happens if they're not available for a while.



Keep in mind, there is an in between..

I would have a hard time doing servers as cheap as hetzner for example including the routing and everything



I do that. In fact I've been doing it for years, because every time I do the math, AWS is unreasonably expensive and my solo-founder SaaS would much rather keep the extra money.

I think there is an unreasonable fear of "doing the routing and everything". I run vpncloud, my server clusters are managed using ansible, and can be set up from either a list of static IPs or from a terraform-prepared configuration. The same code can be used to set up a cluster on bare-metal hetzner servers or on cloud VMs from DigitalOcean (for example).

I regularly compare this to AWS costs and it's not even close. Don't forget that the performance of those bare-metal machines is way higher than of overbooked VMs.



100% agree. People still think that maintaining infrastructure is very hard and requires lot of people. What they disregard is that using cloud infrastructure also requires people.


I was more talking about physical backbone connection which hetzner does for you.

We are using hetzner cloud.. but we are also scaling up and down a lot right now



You usually just do colocation. The data center will give you a rack (or space for one), an upstream gateway to your ISP, and redundant power. You still have to manage a firewall and your internal network equipment, but its not really that bad. I've used PFsense firewalls, configured by them for like $1500, with roaming vpn, high availability, point to point vpn, and as secure as reasonably possible. After that it's the same thing as the cloud except its physical servers.


i mean, yes.. but you pay for that, and colocation + server deprication in the case i calculated was higher then just renting the servers


Could you please explain what you mean by "physical backbone connection", as I can't think of a meaning that fits the context.

If you mean dealing with the physical dedicated servers that can be rented from Hetzner, that's what the person you replied to was talking about being not so difficult.

If you mean everything else at the data centre that makes having a server there worthwhile (networking, power, cooling, etc.) I don't think people were suggesting doing that themselves (unless you're a big enough company to actually be in the data centre business), but were talking about having direct control of physical servers in a data centre managed by someone like Hetzner.

(edit: and oops sorry I just realised I accidentally downvoted your comment instead of up, undone and rectified now)



With "routing" I meant the backbone connection, which is included in the hetzner price.

Aka if I add up power (including backup) + backbone connection rental + server deprication I can not do it for the hetzner price..

That was quite imprecise, sorry about that.



No worries, easy to not foresee every possible way in which strangers could interpret a comment!

But I think that people (at least jwr, and probably even nyc_data_geek saying "on prem") are talking about cloud (like AWS) vs. renting (or buying) servers that live in a data centre run by a company like Hetzner, which can be considered "on prem" if you're the kind of data centre client who has building access to send your own staff there to manage your servers (while still leaving everything else, possibly even legal ownership and therefore deprecation etc. to the data centre owner).

What you're thinking of - literally taking responsibility for running your own mini data centre - I think is hardly ever considered (at least in my experience), except by companies at the extremes of size. If you're as big as Facebook (not sure where the line is but obviously including some companies not AS big as Meta but still huge) then it makes sense to run your own data centres. If you're a tiny business getting less than thousands of website visits a day and where the website (or whatever is being hosted) isn't so important that a day of downtime every now and then isn't a big deal, then it's not uncommon to host from the company's office itself (just using a spare old PC or second hand cheap 1U server, maybe a cheap UPS, and just connected to the main internet connection that people in the office use, and probably managed by a single employee, or company owner, who happens to be geeky enough to think it's one or both of simple or fun to set up a basic LAMP server, or even a Windows server for its oh-so-lovely GUI).



I think no one talked about having physical server on their own premises but colocating servers in a data center or renting servers in a data center.


When talking about Hetzner pricing, please don’t change the subject to AWS pricing. The two have nothing in common, and intuition derived from one does not transfer to the other.


> The two have nothing in common

If all you need are some cloud servers, or a basic load balancer, they are pretty much the same.

If you need a plethora of managed services and don't want to risk getting fired over your choice or specifics of how that service is actually rendered, they are nothing alike and you should go for AWS, or one of the other large alternatives (GCP, Azure etc.).

On the flip side, if you are using AWS or one of those large platforms as a glorified VPS host and you aren't doing this in an enterprise environment, outside of learning scenarios, you are probably doing something wrong and you should look at Hetzner, Contabo, or one of those other providers, though some can still be a bit pricey - DigitalOcean, Vultr, Scaleway etc.



> the two have nothing in common

Well, in my case at least, what they have in common is that I can choose to run my business on one or the other. So it's not about intuition, but rather facts in my case: I avoid spending a significant amount of money.

I (of course) do realize that if you design your software around higher-level AWS services, you can't easily switch. I avoided doing that.



> please don’t change the subject to AWS pricing

Why? The only reason I'm using Hetzner and not AWS for several of my own projects (even though I know AWS much better since this is what I use at work) is an enormous price difference in each aspect (compute, storage, traffic).



It's not an either/or. Many business both own and rent things.

If price is the only factor, your business model (or executives' decision-making) is questionable. Buy only the cheapest shit, spend your time building your own office chair rather than talking to a customer, you aren't making a premium product, and that means you're not differentiated.



i would imagine that cloud infrastructure has the ability for fast scale up, unlike self-owned infrastructure.

For example, how long does it take to rent another rack that you didnt plan for?

And not to mention that the cost of cloud management platforms that you have to deploy to manage these owned assets is not free.

I mean, how come even large consumers of electricity does not buy and own their own infrastructure to generate it?



Ordering that amount of amount of servers takes about one hour with hetzner. If you truly want a complete rack on your own maybe a few days as they have to do it manually.

Most companies don‘t need to scale up full racks in seconds. Heck, even weeks would be ok for most of them to get new hardware delivered. The cloud planted the lie into everyone‘s head that most companies dont have predictable and stable load.



Most businesses could probably know server needs 6-12 months out. There's a small number of businesses in the world that actually need dynamic scaling.


What would be the cost/time of scaling down a rack on Hetzner?


rental period is a month you can also use hetzner cloud, which is still roughly 10x less expensive then aws and that does not take into account the vastly cheaper traffic


One other appealing alternative for smaller startups is to run Docker on one burstable vm. This is a simple setup and allows you to go beyond the cpu limits and also scale up the vm.

Might be other alternatives than using Docker so if anyone has tips for something simpler or easier to maintain, appreciate a comment.



>I mean, how come even large consumers of electricity do not buy and own their own infrastructure to generate it?

They sure do? BASF has 3 power plants in Hamburg, Disney operate Reedy Creek Energy with at least 1 power plant and I could list a fair bit more...

>For example, how long does it take to rent another rack that you didnt plan for?

I mean, you can also rent hardware a lot cheaper then on AWS. There certainly are providers where you can rent out a rack for a month within minutes



Some universities also have their own power plants. It’s also becoming more common to at least supplement power on campus with solar arrays.


RDS pricing is deranged at the scales I've seen too. $60k/year for something I could run on just a slice of one of my on-prem $20k servers. This is something we would have run 10s of. $600k/year operational against sub-$100k capital cost pays DBAs, backups, etc with money to spare.

Sure, maybe if you are some sort of SaaS with a need for a small single DB, that also needs to be resilient, backed up, rock solid bulletproof.. it makes sense? But how many cases are there of this? If its so fundamental to your product and needs such uptime & redundancy, what are the odds its also reasonably small?



> Sure, maybe if you are some sort of SaaS with a need for a small single DB, that also needs to be resilient, backed up, rock solid bulletproof.. it makes sense? But how many cases are there of this?

Most software startups these days? The blog post is about work done at a startup after all. By the time your db is big enough to cost an unreasonable amount on RDS, you’re likely a big enough team to have options. If you’re a small startup, saving a couple hundred bucks a month by self managing your database is rarely a good choice. There’re more valuable things to work on.



>By the time your db is big enough to cost an unreasonable amount on RDS, you’re likely a big enough team to have options.

By the time your db is big enough to cost an unreasonable amount on RDS, you've likely got so much momentum that getting off is nearly impossible as you bleed cash.

You can buy a used server and find colocation space and still be pennies on the dollar for even the smallest database. If you're doing more than prototyping, you're probably wasting money.



In the small SaaS startup case, I’d say the production database is typically the most critical single piece of infra, so self hosting is just not a compelling proposition unless you have a strong technical reason where having super powerful database hardware is important, or a team with multiple people who have sysadmin or DBA experience. I think both of those cases are unusual.

I’ve been the guy managing a critical self-hosted database in a small team, and it’s such a distraction from focusing on the actual core product.

To me, the cost of RDS covers tons of risks and time sinks: having to document the db server setup so I’m not the only one on the team who actually knows how to operate it, setting up monitoring, foolproof backups so I don’t need to worry that they’re silently failing because a volume is full and I misconfigured the monitoring, PITR for when someone ships a bad migration, one click HA so the database itself is very unlikely to wake me at 3am, blue/green deploys to make major version upgrades totally painless, never having to think about hardware failures or borked dist-upgrades, and so on.

Each of those is ultimately either undifferentiated work to develop in-house RDS features that could have been better spent on product, or a risk of significant data loss, downtime, or firefighting. RDS looks like a pretty good deal, up to a point.



I like fiddling with databases, but I totally agree with this. Unless you really need a big database and are going to save 100k+ per year by going self managed then RDS or similar just saves you so much stress. We've been using it for the best part of 10 years and uptime and latency have consistently been excellent, and functionality is all rock solid. I never have to think about it, which is just what I want from something so core to the business.


I am good at databases (have been a DBA in the past), and 100% agree with this. RDS is easy to standup and get all the things you mentioned, and not have to think about again. If we grow to the point where the overhead is more than a FT DBA, awesome. It means we are successful, and are fortunate to have options.


Unfortunately there are so many people and teams who thinks that simply running their databases on RDS means that they're backed up, highly-available and can be easily load balanced, upgraded, partitioned, migrated and so on which is simply not the case with the basic configuration.

RDS is a great choice, for prototyping and only for production if you know what you're doing when setting it up.

FWIW, this is common in all cloud deployments, people assume that running something "severless" is a magical silver bullet.



Well…just using the defaults when creating an RDS Postgres in the console give you an HA cluster with two read replicas, 7 days of backups restorable to any point in time, automatic minor version upgrades, and very easy major upgrades. So unless you start actively unchecking stuff those are not entirely invalid assumptions.


I agree, but I also classify some of these as "learn them once and you're all set".

Maybe it takes you a month the first time around and a week the 10th time around. First product suffers, the other products not so much. Now it just takes a week of your time and does not require you to pay large AWS fees, which means you are not bleeding money

I like to set up scrappy products that do not rack up large monthly fees. This means I can let them run unprofitable for longer and I don't have to seek an investor early, which would light up a large fire under everyone's butts and start influencing timelines because now they have the money and want a return asap.

I'll launch a week later - no biggie usually. I could have come up with the idea a month later, so I'm still 3 weeks early ;)

It doesn't work for all projects, obviously, but I've seen plenty of SaaS start out with a shopping spree, then pay monthly fees and purchase licenses for stuff that they could have set up for free if they put some (usually not a lot) effort into it. When times get rough, the shorter runway becomes a hard fact of life. Maybe they wouldn't have needed a VC and could have bootstrapped and also survived for longer.



Learning it all is what gave me an appreciation for RDS! I’ve self managed a number of Postgres and MySQL databases, including a 10TB Postgres cluster with all of the HA and backup niceties.

While I generally agree as far as initial setup time goes, I favor RDS because I can forget about it, whereas the hand rolled version demands ongoing maintenance, and incurs a nonzero chance of simple mistakes that, if made, could result in a 100% dataloss unrecoverable scenario.

I’m also mostly talking about typical, funded startups here, as opposed to indie/solo devs. If you’re flying solo launching a tiny proof of concept that may only ever have a few users, by all means run it yourself if you’d like, but if you’ve raised money to grow faster and are paying employees to iterate rapidly searching for PMF…just pay for RDS and make sure as much time as possible is spent on product features that provide actual business value. It starts at like $15/month. The cost of simply not being laser-focused on product is far greater.



> you've likely got so much momentum that getting off is nearly impossible as you bleed cash.

Databases are not particularly difficult to migrate between machines. Of all the cloud services to migrate, they might actually be the easiest, since the databases don't have different API's that need to be rewritten for, and database replication is a well-established thing.

Getting off is quite the opposite of nearly impossible.



That’s just another way of saying the opportunity cost isn’t worth paying to do the migration.

Optionality and flexibility are extremely valuable, and that is why cloud compute continues to be popular, especially for rapidly/burstily growing businesses like startups.



I don't mean to pick on your specific comments, but I find these analysis almost always lack a crucial perspective: level of knowledge. This is the single biggest factor, and it's the hardest one to be honest about. No one wants to say "RDS is a good choice . . . because I don't know how nor have I ever self managed a database."

If you want a different opportunity cost, get people with different experience. If RDS is objectively expensive, objectively slow, but subjectively easy, change the subject.



> No one wants to say "RDS is a good choice . . . because I don't know how nor have I ever self managed a database."

I don't think that's accurate. I've self-managed databases, and I still think that RDS is compelling for small engineering teams.

There's a lot to get right when managing a database, and it's easy to screw something up. Perhaps none of the individual parts are super-complicated, but the cost of failure is high. Outsourcing that cost to AWS is pretty compelling.

At a certain team size, you'll end up with a section of the team that's dedicated to these sorts of careful processes. But the first place these issues come up is with the database, and if you can put off that bit of organizational scaling until later, then that's a great path to choose.



Lack of expertise in some particular technology is simply another opportunity cost. I can learn how to operate a production DB at scale (I have racked servers and run other production workloads) but as cofounder/CTO in a startup is that the best use of my time?

If the cost of a hosted DB is going to sink the company, then of course, I will figure it out and run it myself. But it’s not, for most startups. And therefore that knowledge isn’t providing much leverage.

Starting an AI company with deep expertise in training models - that is an example of knowledge providing huge leverage. DB tech is not in this bucket for most businesses.



I disagree here. This falls apart when you zoom out one step. I'm perfectly capable of managing a database. I'm also capable of maintaining load balancers, redis, container orchestrators, Jenkins, perforce, grafana, Loki, Oncall, individually. But each of those has the high chance of being a distraction from what our software actually does.

Its about tradeoffs, and some tradeoffs are often more applicable than others - getting a ping at 7am on a Sunday because your ec2 instance filled it's drive up with logs and your log rotation script failed because it didn't have a long enough retey is a problem I'm happy to outsource when I should be focusing on the actual app.



On the other hand cloud platforms can be hard to migrate off, which is very much taking away options.


People do not really understand the value of the former. Even dealing with financial options (buy/sell and underlying) which are a pure form of it, people either do not understand the value, or do so in a very abstract way they do not intuit.


Good point. And, since you brought up financials, you also see this when people use a majority of their savings to lump sum pay off a mortgage. They take an overweighted view of saving on interest and, IMO, underweight the flexibility of liquidity.


I have a small MySQL database that’s rather important, and RDS was a complete failure.

It would have cost a negligible amount. But the sheer amount of time I wasted before I gave up was honestly quite surprising. Let’s see:

- I wanted one simple extension. I could have compromised on this, but getting it to work on RDS was a nonstarter.

- I wanted RDS to _import the data_. Nope, RDS isn’t “SUPER,” so it rejects a bunch of stuff that mysqldump emits. Hacking around it with sed was not confidence-inspiring.

- The database uses GTIDs and needed to maintain replication to a non-AWS system. RDS nominally supports GTID, but the documented way to enable it at import time strongly suggests that whoever wrote the docs doesn’t actually understand the purpose of GTID, and it wasn’t clear that RDS could do it right. At least Azure’s docs suggested that I could have written code to target some strange APIs to program the thing correctly.

Time wasted: a surprising number of hours. I’d rather give someone a bit of money to manage the thing, but it’s still on a combination of plain cloud servers and bare metal. Oh well.



replication to non-AWS systems. "simple" extension problems importing data into RDS because of your custom stuff lurking in a mysqldump

Sounds like you are walking massive edge



> Sure, maybe if you are some sort of SaaS with a need for a small single DB, that also needs to be resilient, backed up, rock solid bulletproof.. it makes sense? But how many cases are there of this?

Very small businesses with phone apps or web apps are often using it. There are cheaper options of course, but when there is no "prem" and there are 1-5 employees then it doesn't make much sense to hire for infra. You outsource all digital work to an agency who sets you up a cloud account so you have ownership, but they do all software dev and infra work.

> If its so fundamental to your product and needs such uptime & redundancy, what are the odds its also reasonably small?

Small businesses again, some of my clients could probably run off a Pentium 4 from 2008, but due to nature of the org and agency engagement it often needs to live in the cloud somewhere.

I am constantly beating the drum to reduce costs and use as little infra as needed though, so in a sense I agree, but the engagement is what it is.

Additionally, everyone wants to believe they will need to hyperscale, so even medium scale businesses over-provision and some agencies are happen to do that for them as they profit off the margin.



A lot of my clients are small businesses in that range or bigger.

AWS and the like are rarely a cost effective option, but it is something a lot of agencies like, largely because they are not paying the bills. The clients do not usually care because they are comfortable with a known brand and the costs are a small proportion of the overall costs.

A real small business will be fine just using a VPS provider or a rented server. This solves the problem of not having on premise hardware. They can then run everything on a single server, which is a lot simpler to set up, and a lot simpler to secure. That means the cost of paying someone to run it is a lot lower too as they are needed only occasionally.

They rarely need very resilient systems as they amount of money lost to downtime is relatively small - so even on AWS they are not going to be running in multiple availability zones etc.



Lots of cases. It doesn't even have to be a tiny database. Within Also, Aurora gives you the block level cluster that you can't deploy on your own - it's way easier to work with than the usual replication.


Once you commit to more deeply Amazon flavored parts of AWS like Aurora, aren't you now fairly committed to hoping your scale never exceeds the cost-benefit tradeoff?


If my scale exceeds the cost benefit tradeoff, then I will thank God/Allah/Buddah/Spaghetti Monster.

These questions always sound flawed to me. It's like asking won't I regret moving to California and paying high taxes once I start making millions of dollars? Maybe? But that's an amazing problem to have and one that I may be much better equipped to solve.

If you are small, RDS is much cheaper, and many company killing events, such as not testing your backups are solved. If you are big and you can afford a 60K/yr RDS bill than you can make changes to move on-prem. Or you can open up excel and do the math if your margins are meaningfully affected by moving on-prem.



Agree. "What if you're wildly successful and get huge?" Awesome, we'll solve the problem then. The other part is what if AWS was a part of becoming successful? IE, it freed my small team from having to worry all that much about a database and instead focused on features.


I assume that you do that math on all your new features too, right? The calculation of how much extra money they will bring in?

On some level, AWS/GCP/California relies on you doing this calculation for the things that you can do it on easily (the savings of moving away), while not doing this calculation on things where it's hard to do (new development). That way, you can pretend that your new features are a lot more valuable than the $Xk/year you will save by moving your infra.



>The calculation of how much extra money they will bring in?

Yes, I've done the math. The piece you are missing is, saving money on infra will bring in $0 new dollars. There is a floor to how much money I can save. There is no ceiling to how much money the right feature can bring in. Penny pinching on infra, especially when the amount of money is saved is less than the cost of an engineer is almost always a waste of time while you are growing a company. If you are at the point where you are wasting 1x,2x,3x of an engineers salary of superflous infrastructure - then congratulations you have survived the great filter for 99% of startups.

>That way, you can pretend that your new features are a lot more valuable than the $Xk/year you will save by moving your infra.

Finding product market fit is 1000x harder than moving from RDS to On-prem. If you haven't solved PMF, then no amount of $Xk/year in savings will save you from having to shut down your company.



I am well aware of the math on that. Also, switching to faster infra can be a surprising benefit to your revenue, by the way, if it makes your app feel nicer.

The thing is, most features, particularly later in the life of a company, don't have an easy-to-measure revenue impact, and I suspect that many features are actually worth $0 of revenue. However, they cost money to implement (both in engineering time and infra), making them very much net negative value propositions. This is why Facebook and Google can cut tons of staff and lose nothing off their revenue number.

Also, there's a bit of a gambling mentality here which is that a feature could be worth effectively infinite revenue (ie it could be the thing that gives you PMF), so it's always worth doing over things with known, bounded impact on your bottom line. However, improving your efficiency gives you more cracks at finding good features before you run out of money.



Aurora supports standard Postgres clients.

So moving to/from Aurora/RDS/own EC2/on-prem should be a matter of networking and changing connection strings in the clients.

Your operational requirements and processes (backup/restore, failover, DR etc) will change, but that's because you're making a deliberate decision weighing up those costs vs benefits.



Pro tip side note:

You can use DNS to mitigate the pain of changing those connection strings, decoupling client change management from backend change process, or if you had foresight, not having to change client connection strings at all.



Nope, nope, nope! When you change DNS entries, they will take effect at some point in the future when the cache expires and when your app decides to reconnect. (Possibly after a restart) At that point, why not be sure and change the config?

I mean, DNS change can work, but when you're doing that one-in-years change, why risk the extra failure modes.



If you’re paying list price at scale you are doing it very wrong.


Sure, but if you're paying anywhere near list price for your on-prem hardware at scale you're also doing it wrong. I've never seen a scenario where Amazon discounts exceed what you would get from a hardware or software vendor at the same scale.


Interesting how cloud services are sold like used cars.


It's more interesting how cloud services are sold like any other consumables or corporate services.

No one runs their own electricity supply (well until recently with renewables/storage), they buy it as a service, up to a pretty high scale before it becomes more economic to invest the capex and opex to run your own.



Or you're realistic about what you're doing. Will you ever need to scale more than 10x? And on the timescales where you do grow over 10x, would it be better to reconsider/re-architect everything anyway?

I mean, I'm looking after a 4 instance Aurora cluster which is great feature wise, is slightly overprovisioned for special events, and is more likely to shrink than grow 2x in the next decade. If we start experiencing any issues, there's lots of optimisations that can be still gained from better caching and that work will be cheaper than the instance size upgrade.



…no?

There’s still a defined cost to swapping your DB code over to a different backend. At the point where it becomes uneconomical, you’re also at a scale you can afford rewriting a module.

That’s why we have things like “hexagonal architecture”, which focus on isolating the storage protocol from the code. There’s an art to designing such that your prototype can scale with only minor rework — but that’s why we have senior engineers.



RDS is not so bulletproof as advertised, and the support is first arrogant then (maybe) helpful.

People pay for RDS because they want to believe in a fairy tale that it will keep potential problems away and that it worked well for other customers. But those mythical other customers also paid based on such belief. Plus, no one wants to admit that they pay money in such irrational way. It's a bubble



Plus aws outright lie to us about zero downtime upgrades.

Come time for force major upgrade shoved down our throat? Downtime, surprise, surprise



The US DoD for sure.


Out of curiosity, who is your onprem provider?


> $600k/year operational against sub-$100k capital cost pays DBAs, backups, etc with money to spare.

One of these is not like the others (DBAs are not capex.)

Have you ever considered that if a company can get the same result for the same price ($100K opex for RDS vs same for human DBA), it actually makes much more sense to go the route that takes the human out of the loop?

The human shows up hungover, goes crazy, gropes Stacy from HR, etc.

RDS just hums along without all the liabilities.



Not only that, you can't just have one DBA. You need a team a them, otherwise that person is going to be on call 24/7, can never take a vacation, etc. Your probably looking at a minimum of 3.


And when you have performance issues you still need a DBA. Because RDS only runs your database. It is up to you to make it fast.


You'll need an engineer with database skills, not a dedicated DBA. I haven't seen a small company with a full time DBA in well over a decade. If you can learn a programming language, you can learn about indexes and basic tuning parameters (buffer pool, cache, etc.)


That's a huge instance with an enterprise license on top. Most large SaaS companies can run off of $5k / m or cheaper RDS deployments which isn't enough to pay someone. The amount of people running half a million a year RDS bills might not be that large. For most people RDS is worth it as soon as you have backup requirements and would have to implement them yourself.


> Most large SaaS companies can run off of $5k / m or cheaper RDS

Hard disagree. An r6i.12xl Multi-AZ with 7500 IOPS / 500 GiB io1 books at $10K/month on its own. Add a read replica, even Single-AZ at a smaller size, and you’re half that again. And this is without the infra required to run a load balancer / connection pooler.

I don’t know what your definition of “large” is, but the described would be adequate at best at the ~100K QPS level.

RDS is expensive as hell, because they know most people don’t want to take the time to read docs and understand how to implement a solid backup strategy. That, and they’ve somehow convinced everyone that you don’t have to tune RDS.



If you're not using GP3 storage that provides 12K minimum IOPS without requiring provisioned IOPS for >400GB storage, as well as 4 volume striping, then you're overpaying.

If you don't have a reserved instance, then you're giving up potentially a 50% discount on on-demand pricing.

An r6i.12xl is a huge instance.

There are other equivalents in the range of instances available (and you can change them as required, with downtime).



> GP3... as well as 4 volume striping

For MySQL and Postgres, RDS stripes across four volumes once you hit 400 GiB. Doesn't matter the type.

The latency variation on gp3 is abysmal [0], and the average [1] isn't great either. It's probably fine if you have low demands, or if your working set fits into memory and you can risk the performance hit when you get an uncached query.

12K IOPS sounds nice until you add latency into it. If you have 2 msec latency, then (ignoring various other overheads, and kernel or EBS command merging) the maximum a single thread can accomplish in one second is (1000 msec / 1 sec / 2 msec) = 500 I/O. Depending on your needs that may be fine, of course.

> If you don't have a reserved instance, then you're giving up potentially a 50% discount on on-demand pricing.

True, of course. Large customers also don't pay retail.

> An r6i.12xl is a huge instance.

I mean, it goes well past that to .32xl, so I wouldn't say it's huge. I work with DBs with 1 TiB of RAM, and I'm positive there are people here who think those are toys. The original comment I replied to said, "large SaaS," and a .12xl, as I said, would be roughly adequate for ~100K QPS, assuming no absurdly bad queries.

[0]: https://www.percona.com/blog/performance-of-various-ebs-stor...

[1]: https://silashansen.medium.com/looking-into-the-new-ebs-gp3-...



Definitely--I recommend this after you've reached the point where you're writing huge checks to AWS. Maybe this is just assumed but I've never seen anyone else add that nuance to the "just use RDS" advice. It's always just "RDS is worth it" full stop, as in this article.


To some extend that is probably true, because when you’ve built a business that needs a 500k/year database fully on RDS it’s already priced into your profits, and switching to a self-hosted database will seem unacceptably risky for something that works just fine.


> it’s already priced into your profits

Assuming you have any. You might not, because of AWS.



I mean, just use supabase instead. So much easier than RDS. Why even deal with AWS directly? Might as well have a Colo if you need AWS.


>Most large SaaS companies can run off of $5k / m or cheaper RDS deployments which isn't enough to pay someone.

After initial setup, managing equivalent of $5k/m RDS is not full time job. If you add to this, that wages differ a lot around the world, $5k can take you very, very far in terms of paying someone.



This is because you are using SQL Server. Microsoft has intentionally made cloud pricing for SQL server prohibitively expensive for non-Azure cloud workloads by requiring per-core licensing that is extremely punitive for the way EC2 and RDS is architected. This has the effect of making RDS vastly more expensive than running the same workload on bare metal or Azure.

Frankly, this is anti-competitive, and the FTC should look into it, however, Microsoft has been anti-competitive and customer hostile for decades, so if you're still using their products, you must have accepted the abuse already.



The problem you have here is by the time you reach the size of this DB, you are on a special discount rate within AWS.


Discount rates are actually much better too on the bigger instances. Therefore the "sticker price" that people compare on the public site is no where close to a fair comparison.

We technically aren't supposed to talk about pricing publically, but I'm just going to say that we run a few 8XL and 12Xl RDS instances and we pay ~40% off the sticker price.

If you switch to Aurora engine the pricing is absurdly complex (its basically impossible to determine without a simulation calculator) but AWS is even more aggressive with discounting on Aurora, not to mention there are some legit amazing feature benefits by switching.

I'm still in agreeance that you could do it cheaper yourself at a Data Center. But there are some serious tradeoffs made by doing it that way. One is complexity and it certainly requires several new hiring decisions. Those have their own tangible costs, but there are a huge amount of intangible costs as well like pure inconvenience, more people management, more hiring, split expertise, complexity to network systems, reduce elasticity of decisions, longer commitments, etc.. It's harder to put a price on that.

When you account for the discounts at this scale, I think the cost gap between the two solutions is much smaller and these inconveniences and complexities by rolling it yourself are sometimes worth bridging that smaller gap in cost in order to gain those efficiencies.



The new Aurora pricing model helps, and is honestly the only reason we're able to use it. It caps costs: https://aws.amazon.com/blogs/aws/new-amazon-aurora-i-o-optim...


> but I'm just going to say that we run a few 8XL and 12Xl RDS instances and we pay ~40% off the sticker price.

Genuinely curious, how do you that?

We pay a couple of million dollars per year and the biggest spend is RDS. The bulk of those are 8xl and 12xl as you mention and we have a lot of these. We do have savings plans, but those are nowhere near 40%.



Yeah 40% seems like a pipedream. I was at a Fortune 500 defense firm and we couldn't get any cloud provider to even offer us anything close to that discount if we agreed to move to them for 3-4 years minimum. That org ended up not migrating because it was significantly cheaper to buy land and build datacenters from scratch than to rent in the cloud.


There are basically no discounts in govcloud


Defense firms do a lot more than just government work. Also, there are definitely discounts in govcloud when Fortune 500 companies that operate 30+ datacenters start talking to govcloud providers about potentially migrating to their services.


At least according to: https://instances.vantage.sh/rds/?selected=db.r6g.16xlarge,d...

It looks like a reserved instance is 35% off sticker price? Add probably a discount and you'd be around 40% off.



Cloud was supposed to be a commodity. Instead it is priced like at burger at the ski hill.


If it is such a golden goose, then there will be other competitors come in and compete the price down.


not really, the API lock-in and egregious egress fees will keep competitors at the door.

That: and trust is hard earned over a long tail which is harder if you are trying to compete on price.



I think trust is the biggest factor. If you willingly lock yourself into a vendor specific product, that is obviously your own choice.


Elsewhere today I recommended RDS, but was thinking of small startup cases that may lack infrastructure chops.

But you are totally right it can be expensive. I worked with a startup that had some inefficient queries, normally it would matter, but with RDS it cost $3,000 a month for a tiny user base and not that much data (millions of rows at most).



That sounds like the app needs some serious surgery.


Also, it is often overlooked that you still need skilled people to run RDS. It's certainly not "2-clicks and forget" and "you don't need to pay anyone running your DB".

I haven't run a Postgres instance with proper backup and restore, but it doesn't seem like rocket science using barman or pgbackrest.



Data isn't cheap never was. Paying the licensing fees on top make it more expensive. It really depends on the circunstance a managed database usually has exended support from the compaany providing it. You have to weigh a team's expertise to manage a solution on your own and ensure you spent ample time making it resilient. Other half is the cost of upgrading hardware sometimes it is better to just outright pay a cloud provider if you business does not have enough income to outright buy hardware.There is always an upfront cost.

Small databases or test environment databases you can also leverage kubernetes to host an operator for that tiny DB. When it comes to serious data and it needs a beeline recovery strategy RDS.

Really it should be a mix self hosted for things you aren't afraid to break. Hosted for the things you put at high risk.



While I agree that RDS is expensive, you're making two false claims here:

1. Hiring someone full time to work on the database means migrating off RDS

2. Database work is only about spend reduction



I agree that RDS is stupidly expensive and not worth it provided that the company actually hires at least 2x full-time database owners who monitor, configure, scale and back up databases. Most startups will just save the money and let developers "own" their own databases or "be responsible for" uptime and backups.


For a couple hundred grand you can get a team of 20 fully trained people working full time in most parts of the world.


Even for small workloads it's a difficult choice. I ran a small but vital db, and RDS was costing us like 60 bucks a month per env. That's 240/month/app.

DynamoDB as a replacement, pay per request, was essentially free.

I found Dynamo foreign and rather ugly to code for initially, but am happy with the performance and especially price at the end.



In another section , they mentioned they don't have DBA, no app team own the database and the infra team is overwhelmed.

RDS make perfect sense for them



For big companies such as banks this cost comparison is not as straight forward. They have whole data centres just sitting there for disaster recovery. They periodically do switchovers to test DR. All of this expense goes away when they migrate to cloud.


> All of this expense goes away when they migrate to cloud.

They need to replicate everything in multiple availability zones, which is going to be more expensive than replicating data centres.

They still need to test their cloud infrastracuture works.



> All of this expense goes away when they migrate to cloud.

Just to pay someone else enough money to provide the same service and make a profit while do it



Well corporations pay printers to do their printing because they don't want to be in the business of printing. It's the same with infrastructure, a lot of corporations simply don't want to be in the data centre business.


That's how nearly every aspect of every business works; would you you start a bakery by learning construction and building it yourself?


Construction is a one time cost. It infrastructure is in constant use.

It's like accounting and finance. Yeah a lot of companies use tax firms, but they all have finance and accounting in-house.



From what I’ve read, a common model for mmorpg companies is to use on-prem or colocated as their primary and then provision a cloud service for backup or overage.

Seems like a solid cost effective approach for when a company reaches a certain scale.



Lots of companies, like Grinding Gear Games and Square Enix, just rent whole servers for a tiny fraction of the price compared to what the price gouging cloud providers would charge for the same resources. They get the best of both worlds. They can scale up their infrastructure in hours or even minutes and they can move to any other commodity hardware in any other datacenter at the drop of a hat if they get screwed on pricing. Migrating from one server provider (such as IBM) to another (such as Hetzner) can take an experienced team 1-2 weeks at most. Given that pricing updates are usually given 1-3 quarters ahead at a minimum, they have massive leverage over their providers because they an so easily switch. Meanwhile, if AWS decides to jack up their prices, well you're pretty much screwed in the short-term if you designed around their cloud services.


I'd add another criticism to the whole quote:

> Data is the most critical part of your infrastructure. You lose your network: that’s downtime. You lose your data: that’s a company ending event. The markup cost of using RDS (or any managed database) is worth it.

You need well-run, regularly tested, air gapped or otherwise immutable backups of your DB (and other critical biz data). Even if RDS was perfect, it still doesn't protect you from the things that backups protect you from.

After you have backups, the idea of paying enormous amounts for RDS in order to keep your company from ending is more far fetched.



That's the cost of two people.


You don't get the higher end machines on AWS unless you're a big guy. We have Epyc 9684X on-prem. Cannot match that at the price on AWS. That's just about making the choices. Most companies are not DB-primary.


I think most people who’ve never experienced native NVMe for a DB are also unaware of just how blindingly fast it is. Even io2 Block Express isn’t the same.


Funny enough, the easiest way to experience this is probably to do some performance experimentation on the machine you code on. If it's a laptop made in the last few years, the performance you can get out of it knowing that it's sipping on a 45W power brick with probably not great cooling will make you very skeptical of when people talk about "scale".


Most databases expressly say don’t run storage over a network.


To be fair, most networked filesystems are nowhere near as good as EBS. That’s one AWS service that takes real work to replicate on-prem.

OTOH, as noted, EBS does not perform as well as native NVMe and is hilariously expensive if you try. And quite a few use cases are just fine on plain old NVMe.



Thats because EBS is a network block device and not a network filesystem - that would be EFS. And with network block devices you get the same perf and better compared to EBS.


Yes. We have it 4x striped on those same machines. Burns like lightning.


Ha, I did just the same thing - and also optimized for an extremely fast per-thread CPU (which you never get from managed service providers).

The query times are incredible.



The only problem is it hides all of the horrible queries. Ah well, can’t have it all.


I have one of those. It’s so fast I don’t even know what to do with it.


In your case it sounds more viable to move to VMs instead of RDS, which some cloud providers also recommend.


> Picking AWS over Google Cloud

I know this is an unpopular opinion but I think google cloud is amazing compared to AWS. I use google cloud run and it works like a dream. I have never found an easier way to get a docker container running in the cloud. The services all have sensible names, there are fewer more important services compared to the mess of AWS services, and the UI is more intuitive. The only downside I have found is the lack of community resulting in fewer tutorials, difficulty finding experienced hires, and fewer third party tools. I recommend trying it. I'd love to get the user base to an even dozen.

The reasoning the author cites is that AWS has more responsive customer service and maybe I am missing out but it would never even occur to me to speak to someone from a cloud provider. They mention having "regular cadence meetings with our AWS account manager" and I am not sure what could be discussed. I must be doing simper stuff.



> "regular cadence meetings with our AWS account manager" and I am not sure what could be discusse.

As being on a number of those calls, its just a bunch of crap where they talk like a scripted bot reading from corporate buzzword bingo card over a slideshow. Their real intention is two fold. To sell you even more AWS complexity/services, and to provide "value" to their person of contact (which is person working in your company).

We're paying north of 500K per year in AWS support (which is a highway robbery), and in return you get a "team" of people supposedly dedicated to you, which sounds good in theory but you get a labirinth of irresponsiblity, stalling and frustration in reality.

So even when you want to reach out to that team you have to first to through L1 support which I'm sure will be replaced by bots soon (and no value will be lost) which is useful in 1 out of 10 cases. Then if you're not satisfied with L1's answer(s), then you try to escalate to your "dedicated" support team, then they schedule a call in three days time, or if that is around Friday, that means Monday etc.

Their goal is to stall so you figure and fix stuff on your own so they shield their own better quality teams. No wonder our top engineers just left all AWS communication and in cases where unavoidable they delegate this to junior people who still think they are getting something in return.



> We're paying north of 500K per year in AWS support (which is a highway robbery), and in return you get a "team" of people supposedly dedicated to you, which sounds good in theory but you get a labirinth of irresponsiblity, stalling and frustration in reality.

I’ve found a lot of the time the issues we run into are self-inflicted. When we call support for these, they have to reverse-engineer everything which takes time.

However when we can pinpoint the issue to AWS services, it has been really helpful to have them on the horn to confirm & help us come up with a fix/workaround. These issues come up more rarely, but are extremely frustrating. Support is almost mandated in these cases.

It’s worth mentioning that we operate at a scale where the support cost is a non-issue compared to overall engineering costs. There’s a balance, and we have an internal structure that catches most of the first type of issue nowadays.



This rings so true from experience it hurts.


This. This is the reality.

I am so tired of the support team having all the real metrics, especially in io and throttling, and not surfacing it to us somehow.

And cadence is really an opportunity for them to sell to you, the parent is completely right.



We are a reasonably large AWS customer and our account manager sends out regular emails with NDA information on what's coming up, we have regular meetings with them about things as wide ranging as database tuning and code development/deployment governance.

They often provide that consulting for free, and we know their biases. There's nothing hidden about the fact that they will push us to use AWS services.

On the other hand, they will also help us optimize those services and save money that is directly measurable.

GCP might have a better API and better "naming" of their services, but the breadth of AWS services, the incorporation of IAM across their services, governance and automation all makes it worth while.

Cloud has come a long way from "it's so easy to spin up a VM/container/lambda".



> There's nothing hidden about the fact that they will push us to use AWS services.

Our account team don't even do that. We use a lot of AWS anyway and they know it, so they're happy to help with competitor offerings and integrating with our existing stack. Their main push on us has been to not waste money.



When I was at AWS, I watched SAs get promoted for saving customers money all the time.

AWS wants happy customers to stick around for a long time, not one month of goosed income



I can't say the same of the reserved instance team which is genuinely running a protection racket business.


Yep. Pay us less every month and stick around for a long time. Getting low prices makes it really difficult to move away.

If you still decided to move away, and want to take data with you, yeah... there is a cost. Heck there is a cost to delete the data you have with them (like S3 content).

Its a good way to do business.



In a previous role I got all of these things from GCP – they ran training for us, gave us early access to some alpha/beta stage products (under NDA), we got direct onboarding from engineers on those, they gave us consulting level support on some things and offered much more of it than we took up.


I don’t have as much experience with aws but I do hate gcp. The ui is slow and buggy. The way they want things to authenticate is half baked and only implemented in some libraries and it isn’t always clear what library supports it. The gcloud command line tool regularly just doesn’t work; it just hangs and never times out forcing you to kill it manually wondering if it did anything and you’ll mess something up running it again. The way they update client libraries by running code generation means there’s tons of commits that aren’t relevant to the library you’re actually using. Features are not available across all client libraries. Documentation contradicts itself or contradicts support recommendations. Core services like bigquery lack any emulator or Docker image to facilitate CI or testing without having to setup a separate project you have to pay for.


Oh, friend, you have not known UI pain until you've used portal.azure.com. That piece of junk requires actual page reloads to make any changes show up. That Refresh button is just like the close-door elevator button: it's there for you to blow off steam, but it for damn sure does not DO anything. I have boundless screenshots showing when their own UI actually pops up a dialog saying "ok, I did what you asked but it's not going to show up in the console for 10 minutes so check back later". If you forget to always reload the page, and accidentally click on something that it says exists but doesn't, you get the world's ugliest error message and only by squinting at it do you realize it's just the 404 page rendered as if the world has fallen over

I suspect the team that manages it was OKR-ed into using AJAX but come from a classic ASP background, so don't understand what all this "single page app" fad is all about and hope it blows over one day



I do use azure a bit so I know what you mean. Googles ui is significantly more buggy in Firefox which I use. On chrome/edge it’s a bit better.


Aws refactored their console to use modern wen spa and it is TERRIBLE.

It amazes me a company that makes that much money has such a crappy client.



Yeah this amazes me as well - the AWS web interface does work, but it's pretty low quality.

You'd think a company with 1.5m employees could find half a dozen decent front end developers, but apparently not



aws is even worse yet somehow people love them, maybe because they get to talk to a support "human" to hand-hold them through all the badness


Totally agree, GCP is far easier to work with and get things up and running for how my brain works compared to AWS. Also, GCP name stuff in a way that tells me what it does, AWS name things like a teenage boy trying to be cool.


That's completely opposite to my experience. Do you have any examples of AWS naming that you think is "teenage boy trying to be cool"? I am genuinely curious.


BigQuery - Athena

Pub/Sub - Kinesis

Cloud CDN - CloudFront

Cloud Domains - Route 53

...



Pub/sub is more like SNS or EventBridge Bus to me


Perfect list, also:

Google Cloud Run - Lambda

Sure I get the reference to the underlying algebraic representation of coding but come on, Lambda tells us nothing of what it does.

Products (not brands, products) should be named in a way that means something to the customer afaic.



> Perfect list, also:

> Google Cloud Run - Lambda

ECS is the AWS equivalent of Cloud Run. GCP Cloud Functions are the equivalent of AWS Lambda.

ECS / Cloud Run = managed container service that autoscales

Lambda / Cloud Functions = serverless functions as a service



Thanks for the clarification hadn’t appreciated the difference. Also somewhat reiterates my point which is nice as well


Have you named any successful product?


Yes, named a product and sold over 100,000 units of them. Naming products is hard but not that hard.


I thought you meant API and parameters. Blaming them for product names is weird to me.


aws api and param names are stupidly long CamelCased and not even consistent half the time like a leaky abstraction over their underlying implementation


You remember any example? I don't call API directly and usually use CLI/SDK/CDK that work a lot better than gcloud. I did see some inconsistencies between services (e.g. updating params for SQS and SNS) and that could definitely be improved. But honestly, comparing to GCP mess, AWS is ten times better.


It's nice when things do what they say on the tin. That being said, it's hard to build a "brand" when you start out with a generic name.


How many popular products have you named and launched? Naming products is hard to meet both usability and marketing objectives. This has never been as big of a problem for me, as GCPs APIs for example. Those are the true evil. Product names I care little for.


> How many popular products have you named and launched?

One, and you often times only need one.



why is that?


Why it's weird to blame them for product names? Because their purpose slightly different. I can see where negativity comes from and understand, but product name is a lot less important as consistent API experience. AWS is the best among big players by far, hats off and well-done to their teams and leadership. I hope the others will finally learn and follow.


My issue isn't just with the names themselves but they are emblematic of AWS's overall mentality. They want to have the AWS(TM) solution to X business case while other cloud providers feel more like utilities that give you building blocks. This obviously works for them and many of their customers I just personally don't care for it. It is probably to do with the level of complexity I am working at (*which is not very complex).

Also, I don't think trying to emulate AWS's support and consistent API makes sense as a strategy for other cloud providers. They will never beat AWS at their own game, it is light years ahead. If cloud providers want to survive they need to fill a different niche and try different things.



> while other cloud providers feel more like utilities that give you building blocks.

Idk why you don't see AWS as a utility providing building blocks.

> I don't think trying to emulate AWS's support and consistent API makes sense as a strategy for other cloud providers.

Those are such essential things. It's very hard to imagine prioritising something else and succeeding in anything.



I have had the experience of an AWS account manager helping me by getting something fixed (working at a big client). But more commonly, I think the account manager’s job at AWS or any cloud or SAAS is to create a reality distortion field and distract you from how much they are charging you.


> I think the account manager’s job at AWS or any cloud or SAAS is to create a reality distortion field and distract you from how much they are charging you.

How do they do this jedi mind trick?



One way is to charge high prices from the get go, and then proactively contact you, and offer to help to reduce your bill by doing some cloud optimization magic ;)


Maybe your TAM is different, but our regularly do presentations about cost breakdown, future planning and possible reservations. There's nothing distracting there.


AWS enterprise support (basically first line support that you paid for) is actually really really good. they will look at your metrics/logs and share with you solid insights. anything more you can talk to a TAM who can then reach out to relevant engineering teams


I share your thoughts. It looks like an entire article endorsing AWS honestly


Heartily seconded. Also don't forget the docs: Google Cloud docs are generally fairly sane and often even useful, whereas my stomach churns whenever I have to dive into AWS's labyrinth of semi-outdated, nigh-unreadable crap.


To be fair there are lots of GCP docs, but I cannot say they are as good as AWS. Everything is CLI-based, some things are broken or hello-world-useless. Takes time to go through multiple duplicate articles to find anything decent. I have never had this issue with AWS.

GCP SDK docs must be mentioned separately as it's a bizarre auto-generated nonsense. Have you seen them? How can you even say that GCP docs are good after that?



very few things are cli only, most have multiple ways to do things. and they have separate guide reference sections that can easily be found. compared to aws where your best bet is to hope google indexed the right page for them.


> few things are cli only

wdym? As far as I see, it's either CLI or Terraform. GCP SDK is complete garbage, at least for Python compared to AWS boto3. I have personally made web UI for AWS CLI man pages as a fun project and can index everything myself if needed. Googling works fine. If you are not happy with it then ChatGPT is to the rescue. I honestly do not see any problem at all.



We're relatively small GCP users (low six figures) and have monthly cadence meetings with our Google account manager. They're very accommodating, and will help with contacts, events and marketing.


> I have never found an easier way to get a docker container running in the cloud

I don't have a ton of Azure or cloud experience but I run an Unraid server locally which has a decent Docker gui.

Getting a docker container running in Azure is so complicated. I gave up after an hour of poking around.



Azure is a complete disaster, deserves its own garbage-category, and gives people PTSD. I don't think AWS/CGP should ever be compared to it at all.


Funnily enough, I have the opposite opinion.

AWS has "fun" features like the ability to just lose track of some resource and still be billed for it. It's in here... somewhere. Not sure which region or account. I'll find it one day.

GCP is made by Google, also known as children that forgot to take their ADHD medication. Any minute now they'll just casually announce that they're cancelling the cloud because they're bored of it.

Azure is the only one I've seen with a sane management interface, where you can actually see everything everywhere all at once. Search, filter, query-across-resources, etc... all work reasonably well.



I am yet to meet an IRL person who believes Azure has "sane management interface". In my experience it was horribly inconvenient, filled with weird anti-UX solutions that were completely unnecessary. It maybe shows you all at once, or at least tries to, but it's such a horrible idea for a complex system. Non-surprisingly it never worked properly with various widgets hanging or erroring-out. It was impossible to see wtf is going on, what state it is in, or how to do anything about it. Azure will always be an example of a web UI done horribly wrong. This does actually not surprise me at all since Microsoft products are known for this. Every time I need to extend my kids Xbox subscriptions I have to pull my hair out to figure out how to do it in their web mess.

How can you even compare it to AWS is a mystery to me. There are pages showing all your resources, not sure why you think it's a problem. Could be a problem from long time ago?



you're lucky if azure works without errors half the time...


Oh I disagree - we migrated from azure to AWS, and running a container on Fargate is significantly more work than Azure Container Apps [0]. Container Apps was basically "here's a container, now go".

[0] https://azure.microsoft.com/en-gb/products/container-apps



Heh, your comment almost echos the positive thing I was going to say, as well as highlighting half of why I loathe Azure with every fiber of my being

https://learn.microsoft.com/en-us/azure/container-instances/... is the one I was going to plug, because coming from a kubernetes background it seems to damn near be the PodSpec and thus both expresses a lot of my needs and also is very familiar https://learn.microsoft.com/en-us/azure/templates/microsoft....

Your link does seem to be a lot more "container, plus all the surrounding stuff" in line with the "apps" part, whereas mine more closely matches my actual experience of what you said: container, go

The "what the fucking hell is wrong with you people?" part is that their naming is just all over the place, and changes constantly, and is almost designed to be misleading in any sane conversation. I quite literally couldn't have guessed whether Container Apps was a prior name of Container Instances, a super set of it, subset, other? And one will observe that while I said Container Instances, and the URL says Container Instances, the ARM is Container Groups. Are they the same? different? old? who fucking knows. It's horrific



Oh yeah. This and resource groups are the only two things that azure did well. Everything else is a disaster.


GCP support is atrocious. I've worked at one of their largest clients and we literally had to get executives into the loop (on both sides) to get things done sometimes. Multiple times they broke some functionality we depended on (one time they fixed it weeks later except it was still broken) or gave us bad advice that cost a lot of money (which they at least refunded if we did all the paperwork to document it). It was so bad that my team viewed even contacting GCP as an impediment and distraction to actually solving a problem they caused.

I also worked at a smaller company using GCP. GCP refused to do a small quota increase (which AWS just does via a web form) unless I got on a call with my sales representative and listened to a 30 minute upsell pitch.



If you are big enough to have regular meetings with AWS you are big enough to have meetings with GCP.

I’ve had technicians at both GCP and Azure debug code and spend hours on developing services.



> I’ve had technicians at both GCP and Azure debug code and spend hours on developing services.

Almost every time Google pulled in a specialist engineer working on a service/product we had issues with it was very very clear the engineer had no desire to be on that call or to help us. In other words they'd get no benefit from helping us and it was taking away from things that would help their career at Google. Sometimes they didn't even show up to the first call and only did to the second after an escalation up the management chain.



Also much prefer GCP but gotta say their support is hot steaming **. I wasted so much time for absolutely nothing with them.


GCP's SDK and documentation is a mess compared to AWS. And looking at the source code I don't see how it can get better any time soon. AWS seems to have proper design in mind and uses less abstractions giving you freedom to build what you need. AWS CDK is great for IAC.

The only weird part I experienced with AWS is their SNS API. Maybe due to legacy reasons, but what a bizarre mess when you try doing it cross-account. This one is odd.

I have been trying GCP for a while and DevX was horrible. The only part that more-or-less works is CLI but the naming there is inconsistent and not as well-done as in AWS. But it's relative and subjective, so I guess someone likes it. I have experienced GCP official guides that broken, untested or utterly braindead hello-world-useless. And also they are numerous and spread so it takes time to find anything decent.

No dark mode is an extra punch. Seriously. Tried to make it myself with an extension but their page is Angular hell of millions embedded divs. No thank you.

And since you mentioned Cloud Run -- it takes 3 seconds to deploy a Lambda version in AWS and a minute or more for GCP Could Function.



> I have never found an easier way to get a docker container running in the cloud

We started using Azure Container Apps (ACA) and it seems simple enough.

Create ACA, point to GitHub repo, it runs.

Push an update to GitHub and it redeploys.



Azure Container Apps (ACA) and AWS AppRunner are also heavily "inspired" by Google Cloud Run.


So?


The author leads infrastructure at Cresta. Cresta is a customer service automation company. His first point is about how happy he is to have picked AWS and their human-based customer service, versus Google's robot-based customer service.

I'm not saying there's anything wrong, and I'm oversimplifying a bit, but I still find this amusing.



Haha very good catch. I prefer GCP but I will admit any day of the week that their support is bad. Makes sense that they would value good support highly.


We used to use AWS and GCP at my previous company. GCP support was fine, and I never saw anything from AWS support that GCP didn't also do. I've heard horror stories about both, including some security support horror stories from AWS that are quite troubling.


Utter insanity. So much cost and complexity, and for what? Startups don’t think about costs or runway anymore, all they care about is “modern infrastructure”.

The argument for RDS seems to be “we can’t automate backups”. What on earth?



Is spending time to make it reliable worth it vs working on your actual product? Databases are THE most critical things your company has.


I see this argument a lot. Then most startups use that time to create rushed half-assed features instead of spending a week on their db that'll end up saving hundreds of thousands of dollars. Forever.

For me that's short-sighted.



All that infra doesn’t integrate itself. Everywhere I’ve worked that had this kind of stack employed at least one if not a team of DevOps people to maintain it all, full time, the year round. Automating a database backup and testing it works takes half a day unless you’re doing something weird


Setting up a multi-az db with automatic failover, incremental backups and PiTR, automated runbooks and monitoring all that doesn't take half a day, not even with RDS.


No, but again, that sounds like a lot of complexity your average startup does not need. Multi-az? Why?


Because their Enterprise client requires it on their due diligence paperwork.


Which makes little sense anyway as in practice the real problems you have are from region/connectivity issues, not AZ failures.


> Automating a database backup and testing it works takes half a day unless you’re doing something weird

True story bro

I'm sure that's possible if you're storing the backup on the same server you're restoring on and everything is on top of the line nvme storage. Otherwise your backup just started to run and will need another few days to finish. And that's only if you're running single master.

You're massively underestimating the challenge to get that kind of automation done in a stable manner - and the maintenance required to keep it working over the years.



I’ve implemented such a process for companies multiple times, bro. I know what I’m talking about.


And that's the problem. "It's easy for me because I've done it a dozen times so it's easy for everyone" is a very common fallacy.


This is an oversimplification, but! Dumping postgres to a file is one command. scp the file to a different server is two commands. (Granted you need to setup ssh keys there too). I have implemented backups this way.

With sqlite you only need the scp part.

You can even push your backup file to an S3 bucket... with one command!

Honestly, this argument mystifies me.

Of course you can make it as complicated as you want to, too. I've also worked on replicating anonymized data from a production OLTP database to a data warehouse. That's a lot more work.



And that works right until you get to publish an incident report like this:

https://about.gitlab.com/blog/2017/02/01/gitlab-dot-com-data...



What happened to having people trained by external trainers for what you need? That’s much cheaper than having everything externally “managed” and still having to integrate all of it. The number of services listed in TFA is just ridiculous.


I've done it before,too. For toy project, it's easy as you said. It's not once you're at scale. It's hilarious that people are down voting my comment. I guess there are a lot of juniors suffering from the dunning Kruger syndrome around right now


I worked at a place with its own colo where they ran several multi TB MySQL database servers. We did weekly backups and it could take days. Our backups were stored on external USB disks. The I/O performance was abysmal. Taking a filesystem snapshot and copying it to USB could take days. The disks would occasionally lock up and someone would have to power cycle them. Total clown show.

I would rather pay for RDS. Databases are the one thing you don't want to screw up.



A startup sized company using this many tools? They're for sure doing something weird (and that's not a compliment :) )

Totally on your side with this one - but alas, people associate value with complexity.



So investing in a critical part of my business is the bad thing to do?


> The argument for RDS seems to be “we can’t automate backups”. What on earth?

I can automate backups and I'm extremely happy they with some extra cost in RDS, I don't have to do that.

Also, at some size automating the database backup becomes non-trivial. I mean, I can manage a replica (which needs to be updated at specific times after the writer), then regularly stop replication for a snapshot, which is then encrypted, shipped to storage, then manage the lifecycle of that storage, then setup monitoring for all of that, then... Or I can set one parameter on the Aurora cluster and have all of that happen automatically.



The argument for RDS (and other services along those lines) is "we can't do it as good, for less".

And, when factoring in all costs and considering all things the service takes care of, it seems like a reasonable assumption that in a free market a team that specializes in optimizing this entire operation will sell you a db service at a better net rate than you would be able to achieve on your own.

Which might still turn out to be false, but I don't think it's obvious why.



Everyone who says they can run a database better than Amazon is probably lying or Has a story about how they had to miss a family event because of an outage.

The point isn’t that you can’t do it, the point is that it’s less work for extremely high standards. It is not easy to configure multi region failover without an entire network team and database team unless you don’t give a shit about it actually working. Oh yea, and wait until you see how much SOC2 costs if you roll your own database.



There are other providers with better value for service within AWS or GCP, like Crunchy.


I agree but also I'm not entirely sure how much of this is avoidable. Even the most simple web applications are full of what feels like needless complexity, but I think actually a lot of it is surprisingly essential. That said, there is definitely a huge amount of "I'm using this because I'm told that we should" over "I'm using this because we actually need it"


As the famous quote goes, "If I'd had more time, I would've written a shorter letter".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com