（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40627558

本文讨论了作者使用 Python 和 Go 进行编码的经验，强调了这两种语言在处理错误方面的差异。在 Python 中，完全忽略异常会导致默认情况下记录异常，而在 Go 中，在数据库事务等关键操作期间忽略错误，这可能会导致意想不到的后果。文中还提到了一个具体实例，ChatGTP 帮助作者将数据库模型从 Prisma/TypeScript 转换为 Python/SQLAlchemy，但引入了一个错误。作者承认，使用人工智能工具生成代码引起了合理的担忧，并强调了彻底理解代码并维护代码所有权的重要性。尽管遇到了错误，但作者将这次经历视为成功创业所必需的成长阵痛的证据。然而，这篇文章缺乏明确的要点，更多地关注博客作者让 ChatGTP 编写一些代码、发现后续错误并依靠客户电子邮件注意到它的故事。总的来说，这篇文章可以被视为关于实施人工智能生成代码相关风险和挑战的警示故事。

No, a lack of monitoring cost you $10K. Your app was throwing a database exception and nobody was alerted that this was not only happening, but happening continuously and in large volumes. Such an alert would have made this a 5-minute investigation rather than 5 days.

If you haven't fixed that alerting deficiency, then you haven't really fixed anything.

Right? The log message would have said the id isn't unique, and then it would have taken much less time to debug this problem.

Programming when everything works is easy, it's handling the problems that makes it hard.

This gets more and more common, companies and founders does not think about the infrastructure since they believe that their cloud provider of choice is going to do it for them with it's magic.

As soon as you expect paying customers in your system you need to have someone with the knowledge and experience to deal with infrastructure. That means logging, monitoring, alerting, security etc.

DevOps.. amateurs.

I think deploying and then going to sleep is the red flag here. They should have deployed the change at 9am or something and had the workday to monitor issues.

While I agree, it wouldn't have helped. The first n sign-ups per commit worked. So the problem didn't manifest during the day, only when they stopped committing.

Now granted, if they'd carried on doing stuff (but not redeployed production) it may have shown up in say mid-aftenoon, or not depending on volume.

And of course the general guideline of not deploying anything to production early in the day, or on Friday, is still valid.

You can deploy and go to sleep if you have monitoring and alerting and someone getting paged. It shouldn't be a human monitoring for issues anyway, so the only reason to choose 9am over bedtime should be that you don't want to risk a late night page, not that someone will actually be checking up actively on the deployment.

For a big launch, you can have your entire team looking at issues in realtime, rather than a small number of people responding to pages in the middle of the night. I think if it's worth it to you there's a big difference.

That is, if you know what to monitor. When you launch an entirely new feature, you often don't know all the ways it will break.

And it's not trivial to set up a correct alert for this one. Simple HTTP 5xx threshold wouldn't work because it's high enough to not wake you up whenever cloud provider restarts Postgres, it's too high to catch this. You need either per-endpoint failure rate alerts or something else more clever.

Sure not having tests, is bad. Doing thing with AI without triple checking is dangerous.

But not having error logging/alerts on your db ? That's the crazy part.

This is a new product, is not legacy code from 20 years ago when they thought it was a neat idea to just throw stuff at the db raw, and check for db errors to do data validation, so alerts are hard because there's so many expected errrors.

Yeah if one instance created one ID, then any integration tests creating more than one user would have failed. There were no testing or logging on a system with live users while doing a refactor between two dynamic languages

Agreed. I'm chuckling because I was wrestling with this exact same bug on a FastAPI project last night. I caught it because I have a habit of submitting API endpoints multiple times with the same data to see how the database reacts. Got a key collision when I tried to submit the endpoint the second time and figured out that setup didn't create a new UUID each time.

Unit tests are good, yes. Monitoring is also good. But just taking 30 seconds to do some manual testing will catch a LOT of unexpected behavior.

I generally follow this process:

1. Get it working: write the code for the desired behavior, not worrying about making it beautiful, testable, whatever.

2. Get it working well: manually testing and finding edge cases, refactoring to get it testable and writing tests to solidify behavior.

3. Get it working fast: optimizing it to be as fast as I need it to be (can sometimes skip this step). No tests should change here, but only new tests.

TBH, if the backend were written in Go, this probably wouldn’t have happened to the extent it did. Somewhere in a log a descriptive error would have shown up.

One of the reasons I use Go whenever possible is that it removes a lot of the classic Python footguns. If you are going to rewrite your backend from Javascript, why would you rewrite it in another untyped, error-prone language?

In python it's harder to ignore errors than in Go.

In go, I've definitely seen:

   tx, err := db.Tx()
   defer tx.Commit() // silently ignores the error on committing, which is the important one

That would have masked this error so it didn't get logged by the application.

In python, if you ignore an exception entirely, like I did that error above, you instead get an exception logged by default.

Python's exceptions also include line numbers, where as Go errors by default wouldn't show you _which_ object has a conflict, even if you logged it.

In general, python's logs are way better than Go's, and exceptions make it way harder to ignore errors entirely than Go's strategy.

> I've definitely seen

What did they say was the thinking behind it? defer tx.Rollback() would make sense, but defer tx.Commit() is nonsensical, regardless of whether or not the error is handled. It seems apparent that the problem there isn't forgetting to check an error, but that someone got their logic all mixed up, confusing rollback with commit.

"defer tx.Commit() is nonsensical"

It's not pure nonsense, it works in the happy path, and it matches the pattern of how people often handle file IO in go.

    f, err := os.OpenFile(...)
    defer f.Close()

... which is another place most gophers ignore errors incorrectly. Just like the "defer tx.Commit()" example, it's collocating the idea of setup and cleanup together.

Those two patterns are so similar, python handles them in the same way:

    with db.begin() as conn: # implicit transaction, gets automatically committed

    with open(...) as f: # implicit file open + close pair, automatically closed

You're of course right that go requires more boiler-plate to do the right thing, but the wrong code has no compiler errors and works in the happy path, and fits how people think about the problem in other languages with sane RAII constructs, so of course people will write it.

> it's collocating the idea of setup and cleanup together.

Okay, sure, but Rollback is the cleanup function. You always want to rollback – you only sometimes want to commit. I suppose this confirms that someone got their logic mixed up.

> You always want to rollback

This phrase highlights the confusion. If you learned SQL before Go, you want to rollback only on error.

Every time I write `defer tx.Rollback()`, I cringe and have to remind myself that yes, it's actually ok to call a method called `Rollback` after succesfully writing data.

If you learned SQL before Go, you'd know the pattern works just fine and is arguably a good sanity check.

    BEGIN;
    INSERT INTO ...
    COMMIT;
    ROLLBACK;

It is not some kind of Go-ism. The Go database/sql package actually executes ROLLBACK in the SQL engine. Check out the error returned by it.

Perhaps you mean learned SQL in the context of languages that consider a failed rollback an exception? In that case one needs to be careful to not rollback, else be stricken to handling the exception, which programmers seem to hate doing.

Unrelated but I’ve never really understood what to do with an error that gets returned by Close() aside from logging it. I’ve also never experienced that error being returned and don’t really understand what could even cause that.

A write is not guaranteed to be written if close fails. You almost certainly want to do something with the error state to ensure that the data gets safely stored in that case.

If you are only reading, then sure, who cares?

Yeah, each have their benefits. I run all my go projects through golangci-lint which is required for merge to master/main, so not checking the error value would not have made it to prod.

I suppose there are probably similar checkers for Python that would have caught the passing of a scalar value instead of a function.

Perhaps this is an argument for mandatory linting in CI.

The blog post is 404ing, here's a Web archive link

https://web.archive.org/web/20240610032818/https://asim.bear...

The author has added an important edit:

> I want to preface this by saying yes the practices here are very bad and embarrassing (and we've since added robust unit/integration tests and alerting/logging), could/should have been avoided, were human errors beyond anything, and very obvious in hindsight.

>

> This was from a different time under large time constraints at the very earliest stages (first few weeks) of a company. I'm mostly just sharing this as a funny story with unique circumstances surrounding bug reproducibility in prod (due again to our own stupidity) Please read with that in mind

Could they have deleted it because of all the negativity?

They did make a silly mistake, but we are humans, and humans, be it individually or collectively, do make silly mistakes.

If they would have made that mistake by writing code and just misunderstood something or oversaw the problem, fine. But making this mistake by copy-pasting from ChatGPT without proper review is just terrible.

I don’t find the source of the error being a careless human writing original code without proper review vs a careless human copy/pasting code without proper review to be significantly different.

If you code for a hobby/fun, yeah, sure, it's a silly mistake.

If you're earning past six figures, are part of a team of programmers, call yourself an professional / engineer, and have technical management above you like a VP of Engineering, yadda yadda....then it's closer to systematic failure of the company's engineering practices than "mistake."

There is a reason we call it software engineering, not software fuckarounding (or, cough, "DevOps Engineeer".)

Software engineering practices assume people are going to make mistakes, and implements procedures to reduce the chances of that making it into production, and reduce the impact of those mistakes if they do make it into production.

I agree, but in fairness, engineering mistakes do happen all the time, in every organisation. A good engineering culture enables mistakes to be acknowledged and reviewed in an emotionally neutral manner, ideally leaning to a learning experience.

Being on the receiving end of an internet pile-on of "OMG you idiots everyone knows the first thing you do when setting up a flerble cluster is spend a week installing grazoono monitoring!" is not conducive towards building a good engineering culture.

> If you're earning past six figures, are part of a team of programmers...

Compensation is in no way correlated with good engineering practices.

They might be paid much because they're developing something which people are willing to pay for, it doesn't have to be "real engineering".

Lazy attitude towards proper role management and poor engineering practices. More common in small companies or small teams managing their own service (db and app)

Really all you need is logging and potentially temporary read access to the db if you need some info that you can't derive from the logs.

what phist wrote. Color code the backgrounds of your servers, different colors. So anyone who connects to 'take console' in any system is hit by a blinding electric Green/Blue/Red/Yellow and other striking colors.

I assume that all systems already have descriptive names App_DEV_Server1, App_PROD_Server5, etc.

It also helps if (ofc they would be right??) in separate IP groups/WLANS?

If you are running Windows, it's a good idea to use BGINFO.exe by SysInternals (or Winternals as we old people still call it), and display the most relevant info (showing Dev/Prod/UAT/etc.) with big-big-big letters.

I spotted the error instantly. With all due respect to your team - this has nothing to do with ChatGPT and everything to do with using a programming model that your team does not have sufficient expertise in. Even if this error managed to slip by code review, it would have been caught with virtually any monitoring solution, many of which take less than 5 minutes to set up.

To be fair, if I wasn't looking for this bug I never would have spotted it. That being said, you're entirely right that any monitoring or even the most basic manual testing should have instantly caught this.

Those kinda issues are why I ALWAYS make an integration test with calling the same insert multiple times.

I did step into that particular trap more than once (passing the result, rather than the function)

This. And it seems the team wasn't able to do basic troubleshooting from either database or application log. This was a simple error - what will happen when transient errors (such as implicit locks on tables, etc) occurs. These guys shouldn't be writing code - at all.

Interestingly, you know who else spotted the error? ChatGPT-4o. Annoyingly you can't share a chat with an image in it, but pasting in the image of the bad code, and prompting "whats wrong with the code" got ChatGPT to tell me that:

* UUID Generation in Primary Key: The default parameter should use the callable uuid.uuid4 directly instead of str(uuid.uuid4()). SQLAlchemy will call the function to generate the value.

* Date Default Value: server_default=text("(now())") might not work as expected. Use func.now() for server-side defaults in SQLAlchemy.

* Import Statements: Ensure uuid and text from sqlalchemy are imported.

* Column Definitions: Consider using DateTime(timezone=True) for datetime columns to handle time zones.

It then provided me with corrected code that does

    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()), unique=True, nullable=False)

where the addition of lambda: fixes the problem.

ChatGPT-4o might spot it when asking about the code directly, but this was a conversion from js to python, errors where chatgpt/copilot or any other AI will allucinate or make mistakes to be as close as the original code are very common in my experience.

The other common issue is if the original code has thinsg chatgpt doesn't like (misspell, slightly wrong formatting) it will fix it automatically, or if he really think you should have added a particular field you didn't add.

Having no real experience with python I would assume uuid.uuid4() was some schema definition (like in prisma), so honestly the fact that this bug exists is not surprising at all and I would have done the same mistake myself, but yah one kubectl logs would have been able to catch it immediately.

...also from next.js and prisma to python? ...what?

It's not some innocent mistake. The title is purposefully clickbait / keyword-y, implying that it was chatgpt that made the 'mistake' for SEO and to generate panicked clicks.

"We made a programming error in our use of an LLM, didn't do any QA, and it cost us $10k" doesn't generate the C-suite "oh shit what if ChatGPT fucks up, what's our exposure!?" reaction. There's a million middle and upper management posting this article on LinkedIn, guaranteed.

It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.

While we're on the subject: LLMs can't make "mistakes." They are not deterministic.

They cannot reason, think, or do logic.

They are very fancy word salad generators that use a lot of statistical probabilities. By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.

Edit: The mods boosted the post; it got downvoted into oblivion, for obvious reasons, and then skyrocketed instantly in rank, which means they boosted it: https://hnrankings.info/40627558/

Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.

I'm sure it's a complete coincidence that the story was apparently authored by someone at a Ycombinator company: https://news.ycombinator.com/item?id=40629998

> It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.

Sponsorblock is great for combating that. (Altough, I conciously avoid channels that mostly do clickbait anyways)

The only part of your post that I do not agree with is:

> It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.

I feel incredibly compelled to ignore it.

>By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.

This makes no sense. Only things that are guaranteed to be correct or accurate can make mistakes? Everyone knows what "mistake" means in this context. Nobody cares what your preferred definition of mistake is.

A mistake is usually seen as something that happens when someone (or metaphorically also, something) makes an error, but is capable of solving similar problems through understanding.

Hard to put in words.

But that's roughly what is concerning about the "AI makes mistakes" narratives.

It implies they are caused by a (fixable) fault in reasoning or memory.

LLM "AI" will always respond that it made a "mistake" when you correct it.

It is trained to do so, and humans often behave similarly.

It is hard to come up with a good definition for "mistake", yes.

That does not change that using this in case of LLM hallucinations is misleading.

> By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.

By that logic, nothing is capable of making mistakes :D.

> Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.

You have a distorted view of what clickbait is and the rules of this site. I suggest you go calm down and try to stop hating on a technology which is just that: a technology! Like any other, it can be misused, but think about why exactly you feel so passionate about this particular technology.

I have seen the same mistake made in code created by humans. Many times, especially in react / typescript/ JavaScript, someone will forget to use a lambda.

I felt the blog post failed to articulate the root cause of the issue and went straight to blaming ChatGPT.

When you rush and make large or non peer code reviewed commits to main it is going to happen.

The real issue was when you rush, take shortcuts and don’t adequately test and peer code review then errors will occur.

I would have imagined that a test that tried a few different signup options would have found the issue immediately.

My mental model for ChatGPT is that it’s an entry-level engineer that will never be promoted to a terminal level and will eventually be let go.

However, this engineer can type infinitely fast, which means it might be useful if used very carefully.

Anyway, letting such a person near financially important code would lead to similar issues, and in both cases, I’d question the judgment of the person that decided to deploy the code at all, let alone without much testing.

This is kind of how it works with Real Engineering™ and other licensed professions - it's the non-licensed people doing most of the grunt work, the PE/architect/licensed professional reviews and signs off on it. But then, by virtue of their signature, they're still on the hook for any problems.

This sort of issue seems common in a few places. E.g. Vue's component props can have defaults, and woe betide you if you use a literal object or array as a default, instead of a function that returns an object or array.

I'm surprised there was no lint rule for this case.

Topical or not, blaming ChatGPT is only scratching the surface.

To be truly reflective, OP needs to dive into the real reason their code had this issue. It wasn’t using GPT, it was not having the controls in place.

I understand how the mistake was made, it seems relatively easy to slip by even when writing code without ChatGPT.

But what I don't understand is how this wasn't caught after the first failure? Does this company not have any logging? Shouldn't the fact the backend is attempting to reuse UUIDs be immediately obvious from observing the error?

They didn’t even know there was an error until the customers came ringing. You always want to know what errors happened before your customers do, logging, alerting, any monitoring at all would have helped them here.

Also facepalms here: UUIDs as strings and UUIDv4.

UUIDs are just 128-bit values. They might be conventionally encoded for humans as hex, but storing them as 36-byte (plus a few more for length) strings is a pointless waste of both space and performance.

They don't have logs and commit directly to production 10/20 times a day.

I don't think 128 bits vs 36 byte performance it's a main concern right now

It is most likely not a performance issues right now, but every pessismisation compound and catch up to you eventually.

36B vs 16B today, tomorrow you need an array of it, and now it isn't cache aligned, and more than twice the overhead.

Most likely instead of manipulating a 36B fixed length string, it is handled as a dynamic string, for extra runtime memory allocations, most likely consuming at least 64B per allocation. Etc etc.

Do this all over the codebase and now you know why all the moderne software is a sloth on what was a supercomputer 30y ago.

Probably not, but this is something that would have taken all of ten seconds to get right. And the size and performance impact is multiplied at every tier of their application.

It’s also not just the size itself. Despite being fixed-size in practice, these are variable-sized strings in application code which now means gajillions of pointless allocations and indirection for everything. There are a ton of knock-on performance consequences here, all on the most heavily-used columns in your data model.

Worst of all should they actually succeed, this is going to be absolutely excruciating to fix.

IFF they’re using MySQL (doubtful), it’s a common mistake due to there not being a native type, and needing to know to use BINARY(16) and casting the string back and forth in code.

But in either case – MySQL or Postgres – they’ve still made the classic mistake of using a UUID as a PK, which will tank performance one way or another. They’ll notice right around when it starts to matter.

IMO this is the real issue.

I guarantee you that they _will_ have another production bug like this sometime in the future (every fast paced project will). You'd hope this next one wont take 5 days to identify.

it wasn't 5 days of only working on this one problem though, it was over 5 calendar days. even if something is gonna take me one day to implement, it's gonna take three days to get enough focus time between all the other meetings and fires to put out in order to actually get a day's worth of coding done

Yeah, I agree with this here. I think it's totally reasonable that something as specific as multiple Stripe subscriptions wouldn't be exercised by normal unit testing; as mentioned in the post, this wouldn't have been an easy error to reproduce via an acceptance test; and I think the focus on ChatGPT is overblown (by both the OP and everyone else) and mistakenly passing a String instead of a Callable to a function that accepts either happens all the time. My gut instinct is that not using an ORM would have prevented this particular issue, but that may just be my bias against ORMs speaking; one could easily imagine a similar bug occurring in a non-database context. My real conclusion is that all the folks crowing that they would have definitely caught this bug are either much better engineers than I am, or (more likely) are just a bit deluded about their own abilities.

I am also very confused about the apparent lack of logging or recourse to logging. It's been a while, but if I recall correctly ECS should automatically propagate the resulting Duplicate Key exceptions which were presumably occurring to CloudWatch without a bunch of additional configuration - was that not happening? If it was happening, did no one think to go check what types of Exceptions were happening overnight?

I’ve seen two types of people approach problems:

Type 1 tries to find the error message and figure out what it really, really means by breaking down the error message and system.

Type 2 does trial and error on random related things until the problem goes away.

I hate to say that I've seen way more type 2s engineers than type 1, but maybe I’m working at the wrong companies.

The way you want to build and lead an engineering team is so that the individual engineer’s approach to the problem doesn’t matter. The idea is to establish deployment safety and ops mechanisms so Type 2 isn’t even possible and your Sr. Engineers coach the jr. engineers in how to use these mechanisms effectively through training and runbooks. The idea of COE is to figure out what mechanisms (availability tests, metrics/alarms, logs, unit tests) were missing so that you can fill those gaps.

Everything can be understandable if this is a small first personal project of someone.

Here we are talking 1.65 MILLION CAD $ backed YC company

> Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI

This is the eye opener for me, how is a startup justifying a re-write when they don't even have customers?

Dear god, I thought I was taking crazy pills. After saying the same thing (a rewrite this early is insane) I was scanning the comments and no one else was pointing this out. I have no clue what would drive someone to rewrite this early (with or without customers) for what is effectively a lateral move (node to python). If you had hundreds of customers and wanted to rewrite in Go or similar then maybe (I still question even that).

> a language that they lack experience in

Perhaps also the tooling because any remotely decent IDE should show an error there, let alone the potential warnings of some code analysis software.

Who needs static analysis when you can put your trust in a magic robot?

(This is one thing that baffles me about the “let’s use LLMs to code” movement; a lot of the proponents don’t seem to be just adding it as a tool (I don’t think it’s a terribly _useful_ tool, but whatever, tastes differ), but using it as the only tool, discarding 50 years worth of progress.)

> This is the eye opener for me, how is a startup justifying a re-write when they don't even have customers?

In my case (with a real project I'm working on now), it'd be due to realizing that C# is a great language and has a good runtime and web frameworks, but at the same time drags down development velocity and has some pain points which just keep mounting, such as needing to create bunches of different DTO objects yet AutoMapper refusing to work with my particular versions of everything and project configuration, as well as both Entity Framework and the JSON serializer/deserializer giving me more trouble than it's worth.

Could the pain points be addressed through gradual work, which oftentimes involves various hacks and deep dives in the docs, as well as upgrading a bunch of packages and rewriting configuration along the way? Sure. But I'm human and the human desire is to grab a metaphorical can of gasoline, burn everything down and make the second system better (of course, it might not actually be better, just have different pain points, while not even doing everything the first system did, nor do it correctly).

Then again, even in my professional career, I get the same feeling whenever I look at any "legacy" or just cumbersome system and it does take an active, persistent effort on my part to not give in to the part of my brain that is screaming for a rewrite. Sometimes rewrites actually go great (or architectural changes, such as introducing containers), more often than not everything goes down in a ball of flames and/or endless amounts of work.

I'm glad that I don't give in, outside of the cases where I know with a high degree of confidence that it would improve things for people, either how the system runs, or the developer experience for others.

You don't need to make DTOs when you don't have to, using AutoMapper is considered a bad practice and is heavily discouraged (if you do have to use a tool like that, there are alternatives like Mapperly which are zero-cost to use and will give you built-time information on what doesn't map without having to run the application).

Hell, most simple applications could do with just a single layer - schema registration in EF Core is mapping, or at most two, one for DB and one for response contracts.

Just do it the simplest way you can. I understand that culture in some companies might be a problem, and it's been historically an issue plaguing .NET, spilling over, originally, from Java enterprise world. But I promise you there are teams which do not do this kind of nonsense.

Things really have improved since .NET Framework days, EF Core productivity wise, while similar in its strong areas, is pretty much an entirely new solution everywhere else.

> You don't need to make DTOs when you don't have to, using AutoMapper is considered a bad practice and is heavily discouraged (if you do have to use a tool like that, there are alternatives like Mapperly which are zero-cost to use and will give you built-time information on what doesn't map without having to run the application).

The thing is, that you'll probably have entities mapped against the database schema with data that must only conditionally be shown to the users. For example, when an admin user requests OrderDetails then you'll most likely want to show all of the fields, but when an external user makes that request, you'll only want to show some of the fields (and not leak that those fields even exist).

DTOs have always felt like the right way to do that, however this also means that for every distinct type of user you might have more than one object per DB table. Furthermore, if you generate the EF entity mappings from the schema (say, if you handle migrations with a separate tool that has SQL scripts in it), then you won't make separate entities for the same table either. Ergo, it must be handled downstream somewhere.

Plus, sometimes you can't return the EF entities for serialization into JSON anyways, since you might need to introduce some additional parsing logic, to get them into a shape that the front end wants (e.g. if you have a status display field or something, the current value of which is calculated based on 5-10 database fields or other stuff). Unless it's a DB view that you select things from as-is, though if you don't select data based on that criteria, you can get away by doing it in the back end.

Not to say that some of those can't be worked around, but I can't easily handwave those use cases away either. In Java, MapStruct works and does so pretty well: https://mapstruct.org/ I'd rather do something like that, than ask ChatGPT to transpose stuff from DDL or whatever, or waste time manually doing that.

I'll probably look into Mapperly next, thanks! The actual .NET runtime is good and tools like Rider make it quite pleasant.

C# differs quite a bit from Java particularly in surrounding libraries, and there is a learning curve to things...the expectation that it's another Java ends up misleading many people :)

I'm sure MapStruct would also require you to handle differences in data presentation, in a similar way you would have to do with Automapper (or Mapperly). .NET generally puts more emphasis on "boilerplate-free happy path + predictable behavior" so you don't have autowire, but also don't have to troubleshoot autowire issues, and M.E.DI is straightforward to use, as an example. In terms of JSON (with System.Text.Json), you can annotate schema (if it's code-first) with attributes and nullability, so that the request for OrderDetails returns only what is available per given access rights scope. In either case different scopes of access to the same data and presentation of such is a complex topic.

Single-layer case might be a bit extreme - I did use it in a microservice-based architecture as PTSD coping strategy after being hard burned by a poor team environment that insisted on misusing DDD and heavy layering for logic that fits into a single Program.cs, doing a huge disservice to the platform.

Another popular mapping library is Mapster: https://github.com/MapsterMapper/Mapster, it is more focused on convenience compared to Mapperly at some performance tradeoff, but is still decently fast (unlike AutoMapper which is just terrible).

For fast DTO declaration, you can also use positional records e.g. record User(string Name, DateOnly DoB); but you may already be aware of those, noting this for completeness mostly.

Overall, it's a tradeoff between following suboptimal practices of the past and taking much harder stance on enforcing simplicity that may clash with the culture in a specific team.

No, ChatGPT made you the money that your app generated since you had no ability to implement it otherwise/without ChatGPT. Your inability to code, debug, log, monitor cost you the $10k. ChatGPT is net positive in this story.

Looking at this team's project at github.com/reworkd, it clearly tells the maturity of the product as well as the team. Emoji driven development. Emoji's for all commit messages. Monkeys, bananas, rockets, fireworks, you name it, they have it in their commit message.

Just wondering what's the end game of these AI startups. YC would reject many other reasonably sound ideas I hear but this space is highly speculative and I don't see any such down-the-stream-chatgpt-wrapper startup reaching a billion dollar IPO.

the emoji's seem fine

instead of `bug: fix blah`, it's `:bug:: fix blah`, which, honestly actually seems clearer and easier to parse at a glance

edit: hacker news doesn't support unicode emojis

It was already implemented, seems like they had ability there:

> Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.

> What happened was that as part of our backend migration, we were translating database models from Prisma/Typescript into Python/SQLAlchemy. This was really tedious. We found that ChatGPT did a pretty exceptional job doing this translation and so we used it for almost the entire migration.

ChatGPT wasn't a net positive if they wouldn't have tried to do this migration up-front without it.

Possibly they had better error logging in the other stack, possibly they didn't, possibly they needed it less because they were actually writing the code for it themselves and knew how it worked.

("Write all the code a second time before turning on monetization" is itself an interesting decision, of course.)

They had 8 AWS tasks running 5 instances each with code written in TypeScript and Python, with frameworks like next.js, with $40 revenue and only a few weeks dev time?

What the actual fuck hahahaha

This is made worse when they edit saying the reason the codes crap is because of time constraints, but spent their time refactoring across languages and spinning up a distributed system FOR NO REASON. That is self imposed harm juggling features and ridiculous technical complexity. What were they thinking.

Edit: a YC summer ‘23 company who’s product is still behind a waitlist summer ‘24, presumably because of a rewrite to Rust

Because they have half a million dollar to star with and 1.2 million dollars on top and then free AWS credits to burn.

If I give them a trampoline they shouldn't spend all day jumping on it, just because they can. Especially if they're busy with features.

They literally had 1 instance of the backend per $1 of revenue, and the reason the bug wasn't seen straight away was because they had 40 backend instances each with a single uuid that could be used for users before it broke with non-unique id errors.

They even said as much in TFA - "[...] overkill, yes we know, but to be fair we had AWS credits".

Fully acknowledging the irony I am about to invoke - this is why I hate startup culture. Not startups, but this ridiculous culture of "well the VC gave us a million bucks and that bought us $100,000 in AWS credits, so let's just use it."

As someone who has built my company fully on my own dime (and the dimes of two colleagues), it's easy enough to burn piles of money in AWS (or any other cloud) when you're making an attempt at being judicious. Spinning up eight backends (edit: running five instances each, no less!) just because you have money, despite the fact that you know you don't need that much compute, is just insane. If for no other reason than you're just throwing your own credits away.

A ten dollar Hetzner with dokku would do fine at this stage.

But then the whole startup culture is generally speaking, a culture of waste, pride and vanity.

"Note: I want to preface this by saying yes the practices here are bad and could have been avoided. This was from a different time under large time constraints. Please read with that in mind"

These "constraints" are why I'm terrified of subscribing to software

I have immense respect for the OP for writing up the story, and even more so for giving this preface. It's really useful to know what mistakes other people make, but can be quite embarrassing to tell others about mistakes you've made. Thanks, OP.

Having worked with some legacy subscription code it can be quite nasty.

We had race conditions where we would charge users twice

This has made me paranoid that any time I see timeout or error related to money I assume it went through and come back later.

A local cinema's shitty website charged me and, immediately after, my wifi disconnected for a few seconds. This somehow crashed their server entirely for several minutes, failing to send me the tickets even when it got back up.

It took me several threatening emails to make them understand they had already taken my money and I wasn't going to try again until I got a refund. Now I'm paranoid any time I purchase online at mediocre shops.

out of curiosity, what's the technical solution? My guess: add an idempotency key to the request and a message queue? Then when you try to consume it, you check whether that request was made previously.

I've written less than 1000 lines of Python in total probably, but I correctly spotted the problem.

Python has this misfeature whereby it didn't correctly crib Common Lisp's evaluation strategy for the expressions that give default values to optional function arguments.

When you have an argument like foo=obj.whatever() the obj.whatever() is evaluated (would you believe it!) at the time the definition of the function is being processed, not at the time when the function is being called and the value is needed.

Is suspect this was done on purpose, for efficiency. Python has another misfeature: it has no literal syntax for certain common objects like lists. There is [1, 2, 3], but that is a constructor and not a literal: it has to create a new list every time it is evaluated and stuff it with 1, 2, 3. (Unless a clever compiler can prove that this can be optimized away without harm.)

The designer didn't want a parameter like list=[] to have to construct a new, empty list object each time the argument is omitted. In Lisp '(1 2 3) and '() are true literals. Whenever they are referenced, they denote the same object. The programmer has a choice here: they can use (list 1 2 3) as the default value expression or '(1 2 3). The former is like [1, 2, 3]: it yields a new object each time that is mutable; the other will (almost certainly) yield the same object and cannot be reliably, portably modified.

Hey, modern popular languages have most of the features of Lisp, so you're not missing anything.

> When you have an argument like foo=obj.whatever(), the obj.whatever() is evaluated at the time the definition of the function is being processed, not at the time when the function is being called.

This can't be correct, surely? What if .whatever() relies on internal state that changes after obj is initialized (or after the function surrounding foo is declared, not sure what you're saying)?

It is correct, it's one of the most surprising things about Python and it causes a number of mistakes, even for experts.

The easiest way to see this is by running something like this and seeing what gets printed out and when:

    print("1. start")
    
    def function(arg=print("2. func definition")):
        print("4. func call")
    
    print("3. after definition")

    function()
    function()
    function()

You should see that the print statement in the default position is called once, when the function definition itself is being evaluated, not when the function gets called.

My canonical example is that you want a function to have an argument that defaults to an empty list:

    def func(arg=[]):
        arg.append(1)
        print(arg)
    func()
    func()
    func()

shoving a print into a function definition is weird and not something you'd do normally. But someone who doesn't know this footgun is going to write a function that defaults to an empty list, and then tear their hair out when things are broken.

That's true, this is the example that most people will run into in practice. But the print example is useful because it shows what's going on more explicitly: the default argument gets evaluated as part of the function's definition. I think this helps people get a better intuition for how Python's interpreter works when it comes to evaluating function definitions.

I find this fascinating because while I don't write much python, this is the behaviour I would assume to be correct based on everything else I've seen. I wouldn't expect it to be evaluated on every call. Definitely shows that your previous familiarity can trip you up (or not)

> it's one of the most surprising things about Python and it causes a number of mistakes, even for experts.

Except every python 101 text seems to go over it, and people seem to have suddenly forgotten about it

ChatGPT driven development maybe?

This has been a source of bugs since long before ChatGPT. I suspect Python tutorials mention it so often because it's such a surprising feature and causes so much confusion. I've been using Python for fifteen years, and it's the sort of thing that I'll still absent-mindedly forget if I'm not careful.

Ironically, in C++ - which does it the other way around - the tutorials mention that often because it is also such a surprising feature (i.e. people do not expect evaluation to occur every time there).

Thing is, there's no obviously correct behavior here, and there are valid arguments to be made either way. Which is why many languages dodge the bullet by only allowing for compile-time constants as defaults in that context (and if you want to evaluate something at runtime, you can always use an optional and do an explicit check inside the body, or provide an overload).

if the default argument is an object it’s reused between invocations. Hence why setting default parameters to empty list / empty dict is flagged by static analysis suites

There is an obj visible at the point of the function definition. When that function definition takes place, obj.whatever() is called, and the value stashed away. Thus the call obj.whatever() has to work at that time. The stashed value is retrieved whenever the corresponding function argument is missing.

Part of the skill in using LLMs is knowing when and how to use them, how to set the 'temperature' (how 'creative' it will be in its response), and how to write a prompt that is less prone to illusory responses.

My eyes were opened one relaxing morning when sipping my coffee and pondering how to tidy up a database column by migrating from string to enum.

I asked ChatGPT for its thoughts and its response seemed perfunctory and on point, until at one particular line, tucked in the otherwise sensible migration file [1], it casually recommended deleting all users whose value for that attribute wasn't among those specified by the enum. I spat my coffee out and learned a very valuable lesson that morning!

[1] https://imgur.com/a/ejIdCH6

I’ve found it’s particularly useful at questions like “how do I do X idiomatically in Rust?”

Idioms are, very conveniently, questions about what the most common shape of X among the broader community, so it tends to do quite well with that.

I appreciate when it spits out example code, but I never copy/paste from it, I always rewrite anything myself to ensure I don’t slip up and accidentally… well, what it tried to sneak in to you…

That’s quite the scary anecdote

On one hand, thanks for being honest about a story of how this bug came to be.

On the other hand, I don’t think advertising the fact that the company introduced a major bug from copy and pasting ChatGPT code around and that they spent a week being unable to even debug why it was failing.

I don’t know much about this startup, but this blog post had the opposite effect of all of the other high quality post-mortem posts I’ve read lately. Normally the goal of these posts is to demonstrate your team’s rigor and engineering skills, not reveal that ChatGPT is writing your code and your engineers can’t/won’t debug it for a very long time despite knowing that it’s costing them signups.

It read like no one really knew what they were doing. "We just let it generate the code and everything seemed to work" is certainly not a good way to market your company.

This is getting more common. I have already had people try to tell me how something works from a chat gpt summary. This would have led to us taking a completely different direction… 5 minutes of reading the actual docs and I found out they were wrong.

Now at a new company i have caught several people copy pasting gpt code that is just horrendous.

It seems like this is where the industry is headed. The only thing i have found gpt to be good at is solving interview questions although it still uses phantom functions about 50% of the time. The future is bumming me out.

Like people who post "here's what ChatGPTx said" instead of their own answer. Quite literally, what is the point?

However, I don't think it's really bad for the technical industries long term. It probably does mean that some companies with loose internal quality control and enough shiftless employees pasting enough GPT spew without oversight will go to the wall because their software became unmaintainable and not useful, but this already happens. It's probably not hugely worse than the flood of bootcamp victims who wrote fizzbuzz in Python, get recruited by a credulous, cheap or desperate company and proceed to trash the place if not adequately supervised. If you can't detect bad work being committed, that's mostly on the company, not ChatGPT. Yes, it may make it harder, a bit, but it was oversight you should already have been prepared to have, doubly so if you can't trust employee output. It also probably implies strong QA, which is already a prerequisite of a solid product company.

Normal interest rates coming back will cut away companies that waste everyone's time and money by overloading themselves on endlessly compounding technical debt.

No, the idea is that historically-normal interest rates around the 5-10% mark won't be conducive to free VC cash being sprayed around for start-ups to wank themselves silly over "piv-iterating" endlessly over spamming complete nonsense and using headcount and office shininess as a substitute for useful and robust products.

Yes, it makes the barrier higher even for good products and helps entrench incumbents, but short of a transnational revolution, the macroeconomic system is what is it and you can only chose to find the good things in it or give up entirely.

> Like people who post "here's what ChatGPTx said" instead of their own answer. Quite literally, what is the point?

Yeah I've seen this and i hate it. If i wanted to know what chatgpt said I'd just ask it myself.

> It read like no one really knew what they were doing. "We just let [devs] generate the code and everything seemed to work" is certainly not a good way to [whatever].

Except, have you met startup devs? This is by and large the "move fast then unbreak things" approach.

It’s the natural outcome of SV types denigrating the value of education.

Forget knowing anything, just come up with a nice pitch deck and let the LLM write the stack.

Not wholly surprised these people are YC backed. I’ve got the impression YC don’t place much weight on technical competence, assuming you can just hire the talent if you know how to sell.

Well, now replace “hire some talent” with “get a GPT subscription and YOLO”, and you get the foundation these companies of tomorrow are going to be built on.

Which hey, maybe that’s right and they know something I don’t.

It’s possible to move fast the same way, but break less things than this. For example, in this case, they said that they introduced tests to mitigate this. I can assure you that introducing tests takes more time than Google searches to check in like 2 minutes what each lines really does.

The idea of moving fast is to have extensive logs and alerts so you fix all error fasts while they appear without "wasting time" with long expensive tests in a phase where things change every day.

5 days to find out you have "duplicate key" errors in the db is the opposite of fast

Its very humbling coming out of startup-land and working with big tech engineers and realizing their tooling runs circles around everybody else and enables them to be much more precise with their work and scale, though it isn't without trade-offs.

Yeah but a lot of that is just the accrual of improvements that is possible with a lot of resources over a long period of time.

People working in "big tech" aren't fundamentally better at building reliable tools and systems; the time and resource constraints are entirely different.

The big tech tooling probably cost tens of millions of dollars to create, and probably had a couple $10k mistakes on the way to getting it written and running.

Eh I imagine they looked over the code as well, doing code review -- and at first glance, the code looks reasonable. I certainly wasn't able to catch the bug even though I tried to find it (and I was given a tiny collection of lines and the knowledge that there's a bug there!).

If anything, I think this says something about how dangerous ChatGPT and similar tools are: reading code is harder than writing code, and when you use ChatGPT, your role stops being that of a programmer and becomes that of a code reviewer. Worse, LLMs are excellent at producing plausible output (I mean that's literally all they do), which means the bugs will look like plausibly correct code as well.

I don't think this is indicative of people who don't know what they're doing. I think this is indicative of people using "AI" tools to help with programming at all.

I'm a some-time Django developer and... I caught the bug instantly. Once I saw it was model/ORM code it was the first thing I looked for.

I say that not to brag because (a) default args is a known python footgun area already and (b) I'd hope most developers with any real Django or SQLAlchemy experience would have caught this pretty quick. I guess I'm just suggesting that maybe domain experience is actually worth something?

I've since moved on to primarily working with Java, so it's been a few years since working with Django on a daily basis and the default still jumped out to me immediately. Experience and domain knowledge is so important, especially when you need to evaluate ChatGPT's code for quality and correctness.

Also, where were their tests in the first place? Or am I expecting too much there?

This is an error that should probably have been caught just based upon the color of the text when it was typed/pasted into the source code. Of the uuid() call was in quotes, it would have appeared as text. When you’re blindly using so much copy/pasted code (regardless of the source), it’s really easy to miss errors like this.

But our existing tools are already built to help us avoid this.

Back in the day, I used a tool from a group in Google called “error-prone”. It was great at catching things like this (and Lorne goal NPE in Java). It would examine code before compiling to find common errors like this. I wish we had more “quick” check tools for more languages.

> This is an error that should probably have been caught just based upon the color of the text when it was typed/pasted into the source code. Of the uuid() call was in quotes, it would have appeared as text.

It's not in quotes. It's a function call.

The issue is that the function call happens once, when you define the class, rather than happening each time you instantiate the class.

> I don't think this is indicative of people who don't know what they're doing. I think this is indicative of people using "AI" tools to help with programming at all.

I think using AI tools to write production code is probably indicative of people who don't really know what they are doing.

The best way not to have subtle bugs is to think deeply about your code, not subcontract it out -- whether that is to people far away who both cannot afford to think as deeply about your code and aren't as invested in it, or to an AI that is often right and doesn't know the difference between correct and incorrect.

It's just a profound abrogation of good development principles to behave this way. And where is the benefit in doing this repeatedly? You're just going to end up with a codebase nobody really owns on a cognitive level.

At least when you look at a StackOverflow answer you see the discussion around it from other real people offering critiques!

ETA in advance: and yes, I understand all the comparison points about using third party libraries, and all the left-pad stuff (don't get me started on NPM). But the point stands: the best way not to have bugs is to own your code. To my mind, anyone who is using ChatGPT in this way -- to write whole pieces of business logic, not just to get inspiration -- is failing at their one job. If it's to be yours, it has to come from the brain of someone who is yours too. This is an embarrassing and damaging admission and there is no way around it.

ETA (2): code review, as a practice, only works when you and the people who wrote the code have a shared understanding of the context and the goal of the code and are roughly equally invested in getting code through review. Because all the niche cases are illuminated by those discussions and avoided in advance. The less time you've spent on this preamble, the less effective the code review will be. It's a matter of trust and culture as much as it's a matter of comparing requirements with finished code.

> And where is the benefit in doing this repeatedly? You're just going to end up with a codebase nobody really owns on a cognitive level.

You could say the same about the output of a compiler. No one owns that at a cognitive level. They own it at a higher level - the source code.

Same thing here. You own the output of the AI at a cognitive level, because you own the prompts that created it.

>No one owns that at a cognitive level

Notwithstanding the fact that compilers did not fall out of the sky and very much have people that own them at the cognitive level, I think this is still a different situation.

With a compiler you can expect a more or less one to one translation between source code and the operation of the resulting binary with some optimizations. When some compiler optimization causes undesired behavior, this too is a very difficult problem to solve.

Intentionally 10xing this type of problem by introducing a fuzzy translation between human language and source code then 1000xing it by repeating it all over the codebase just seems like a bad decision.

Right. I mean... I sometimes think that Webpack is a malign, inscrutable intelligence! :-)

But at least it's supposed to be deterministic. And there's a chance someone else will be able to explain the inner workings in a way I can repeatably test.

Yes, and when compilers fail, it's a very complex problem to solve, that usually requires many hours from experienced dev. Luckily,

(1) Compilers are reproducible (or at least repeatable), so you can share your problem with other, and they can help.

(2) For common languages, there are multiple compilers and multiple optimization options, which (and that's _very important_) produce identically-behaving programs - so you can try compiling same program with different settings, and if they differ, you know compiler is bad.

(3) The compilers are very reliable, and bugs when compiler succeeds, but generates invalid code are even rarer - in many years of my career, I've only seen a handful of them.

Compare to LLMs, which are non-reproducible, each one is giving a different answer (and that's by design) and finally have huge appear-to-succeed-but-produce-bad-output error rate, with value way more than 1%. If you had a compiler that bad, you'd throw it away in disgust and write in assembly language.

    > I think using AI tools to write production code is probably indicative of people who don't really know what they are doing.

People said the same to me for using Microsoft IntelliSense 20 years ago. AI tools for programming are absolutely the future.

This is why I typically only use LLMs in programming as a semi-intelligent doc delver with requests like, “give me an example of usage of X API with Y language on Z platform”.

Not only does it goof up less frequently on small focused snippets like this, it also requires me to pick the example apart and pay close enough attention to it that goofups don’t slip by as easily and it gets committed to memory more readily than with copypasting or LLM-backed autocomplete.

I don't know Python or SQLAlchemy that great, though I do have the benefit of it being cut down to a small amount of code and being told there was a bug there. That said, I didn't see the actual bug, but I did mentally flag that as something I ought to look up how it actually behaved. It's suspicious that some columns used `default` with what looks like Python code while others used `server_default` with what appears to be strings that look more like database engine code. If I was actually responsible for this, I'd want to dig into why there is that difference and where and when that code actually runs.

It's also the case that "code review" covers a lot of things, from quickly skimming the code and saying eh, it's probably fine, to deeply reading and ensuring that you fully understand the behavior in all possible cases of every line of code. The latter is much more effective, but probably not nearly as common as it ought to be.

I totally disagree with this. You might as well argue that we shouldn't use code-completion of any kind because you might accidentally pick the wrong dependency or import. Or perhaps we shouldn't use any third-party libraries at all because you can use them to write reasonable-looking but incorrect code? Heck, why even bother using a programming language at all since we don't "own" how it's interpreted or compiled? Ultimately I agree that using third-party tools saves time at the cost of potentially introducing some types of bugs. (Note that said tools may also help you avoid other types of bugs!) But it's clearly a tradeoff (and one where we've collectively disagreed with you the vast, vast majority of the time) and boiling that down to AI=bad misses the forest for the trees.

It is possible to use autocompletion correctly.

It is possible to use libraries correctly.

It is not possible to use AI correctly. It is only possible to correct its inevitable mistakes.

I don't really care about them marketing their company, but, Jesus, seriously, that's how software is going to be written now. TBH, I'm not really sure if it's that much different from how it was, but it sounds just... fabulous.

> It read like no one really knew what they were doing.

Then no one knows what they are doing. I really don't know any company that doesn't make what could be considered rookie mistakes by some armchair "developer" here on HN.

> It read like no one really knew what they were doing.

In my experience, hardly anyone in software does know what they're doing, for sufficiently rigorous values of "know what you're doing." We all read about other people's stupid mistakes, and think "haha, I would never have done that, because I know about XYZ!" And then we go off and happily make some equally stupid mistake because we don't know about ABC.

I dunno. I tend to annoy people when taking on jobs by telling people what I am concerned about and do not understand, and then sharing with them the extent to which I have managed to allay my own concerns through research.

I turn down a lot of jobs I don't feel confident with; maybe more than I should.

An LLM never will.

That's an alright takeaway: the team made a rookie mistake and then they made a PR mistake by oversharing.

Otherwise, I think this comment thread is a classic example why company engineering blogs choose to be boring. Better ten articles that have some useful information, than a single article that allows the commentariat to pile on and ruin your reputation.

I think it’s an unfair takeaway. I have over a decade of experience and still had to stare at the line to find the bug. If that makes them incompetent, I stand with them. It’s a bug I’ve seen people make in other contexts, not just chatbots.

The AI angle is probably why people are piling on. There’s a latent fear that AI will take our jobs, and this is a great way to skewer home that we’re still needed. For now.

The one thing I will say is that it probably wouldn’t take me days to track it down. But that’s only because I have lots of experience dealing with bugs the way that The Wolf deals with backs of cars. When you’re trying to run a startup on top of everything else, it can be easy to miss.

I’m happy they gave us a glimpse of early stage growing pains, and I don’t think this was a PR fumble. It shows that lots of people want what they’re making, which is roughly the only thing that matters.

> Better ten articles that have some useful information, than a single article that allows the commentariat to pile on and ruin your reputation.

Pile-on aside, the problem with this blog article is that it doesn't really have much of a useful takeaway.

They didn't even really talk the offending line in detail. They didn't really talk about what did to fix their engineering pipelines. It was just a story about how they let ChatGPT write some code, the code was buggy, and the bug was hard to spot because they relied on customers e-mailing them about it in a way that only happened when they were sleeping.

It's not really a postmortem, it's a story about fast and loose startup times. Which could be interesting in itself, except it's being presented more as an engineering postmortem blog minus the actionable lessons.

That's why everyone is confused about why this company posted this as a lesson: The lesson is obvious and, frankly, better left as a quiet story for the founders to chuckle about to their friends.

Eh, I think it speaks fairly well for them.

On the one hand it does seem like a fairly inexperienced organization with some pretty undercooked release and testing processes, but on the other hand all that stuff is ultimately fixable. This is a relatively harmless way of learning that lesson. Admitting a problem is the first step toward fixing it.

A culture of ass-covering is much harder to fix, and will definitely get in the way of addressing these types of issues

More importantly, what was the motivation behind a rewrite from TypeScript to Python? From the article

  Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.

Seems like this entire mess could've been avoided if they had stuck with their existing codebase, which seemed to have been satisfying their business requirements.

Or if you really want to rewrite your back end, why not just use Express? It would be wildly quicker to rewrite than switching languages. That along with the article makes me question the competency of the company. They got customers, sure, but in the long run these decisions will pile up.

You likely needed to do more extra work than needed, when compared to some other options.

The lock-in here is the added developer time and complexity vs. just paying premium.

Simply put - if you want get the best out of the framework, you need to host it in Vercel. Otherwise, there are better options for frameworks. No need to ”fight it”.

You will find many issues from GitHub which are not considered because they would make the framework ”better” or easier to use on other clouds.

Making things worse in a free offering by a company to profit from premium offerings by the same company its the pinacle of capitalism, reminds me of a recurrent joke I have with a friend while playing Call Of Duty, that they will get greedier and soon will sell not only character's skins but also shaders/textures for the maps, oh so you want to see something better than placeholder textures? We have the DLC just for you!

Environment destruction gameplay but there's always another ad under the ads, except when it's a lootbox.

I can imagine worse, too! They haven't even really started turning that knob yet.

The Video game metaphor stretches pretty far.

Madden has a monopoly license for NFL content. For a decade the biggest complaint was how they gate kept rosters behind the yearly re-release. Eventually they allowed roster sharing but they put it behind the most god awful inept UI you could possibly imagine such that practically casual gamers wouldn't bother with it.

Then Madden came out with Madden Ultimate Team (like trading cards MTX) and have been neglecting non-MUT modes ever since. They don't explicitly regress the rest of their game, they just commit resources to that effect.

Its like malicious compliance. They don't embrace, extend, extinguish, but they get a similar effect just with resourcing, layoffs, whatever.

There is no explicit vendor lock-in, but features of the framework are designed heavily towards Vercel-specific features.

The SST team actually has an open-next project[1] that does a ton of work to shim most of the Vercel-specific features into an AWS deployment. I don't think it has full parity and it's a third party adapter from a competing host. The fact that it's needed at all is a sign of now closely tied Next and Vercel are.

[1] https://github.com/sst/open-next

My guess, when I read it, was this would permit them to independently scale the backend on some commodity capacity provider, and then their Nextjs frontend becomes just another React app. OP didn’t mention what their product was, but if it’s AI-adjacent then a python backend doesn’t sound like a terrible idea.

Is there a trend there of moving from Next to FastAPI? I would be surprised.

Perhaps they are doing some AI thing and want to have python everywhere.

This is a pretty common mistake with sqlalchemy whether you’re using ChatGPT or not. I learned the same lesson years ago, although I caught it while testing. I write plenty of python and I just don’t often pass functions in as parameters. In this case you need to!

For something like this where you’re generating a unique id and probably need it in every model, it’s better to write a new Base model that includes things like your unique id, created/changed at timestamps, etc. and then subclass it for every model. Basically, write your model boilerplate once and inherit it. This way you can only fuck this up once and you’re bound to catch it early before you make this mistake in a later addition like subscription management.

Ah, but it’s not a characteristic of SQLAlchemy tho. It’s how Python evaluates statements. Both Peewee and the Django ORM work on the same principle with default values.

The intent is to pass a callable, not to call a function and populate an argument with what it returns.

Correct, it's not specific to sqlalchemy - I'm just saying I notice this a lot with sqlalchemy. Probably because it was the first significant bug I had to figure out how to fix when I introduced it in one of my first apps. I guess we never forget the first time we shot ourselves in the foot.

Yea, more of an issue of how this python library can cause misunderstandings, and ChatGPT failing in the same misunderstanding that would have been made by an engineer who lacks experience in that library.

This mistake would have happened even if they did not use ChatGPT.

I wonder are there linters to detect those types of mistakes for SQLAlchemy. Even though I'm aware of such pitfalls, it's nice if linters can catch them cause I'm not confident to cache them all the time during code review.

Some linters like pyright can identify dangerous defaults in function call, like `def hello(x=[]): pass` (mutable values shouldn't be a default). Linter plugins for widely-used and critical libraries like SQLAlchemy are nice to have.

The mutable-default-arguments issue is easy for a third-party to linter to catch because it doesn't require any specific knowledge about primary keys and databases. Are there static typing plugins for other common packages that would catch issues like this?

Honestly same can be said about a lot of frameworks. You will pry my vanilla JS debugged with print statements hand-coded in vi from these hands only when they're cold and dead.

The bad thing is what they did, not that the disclosed it.

I agree that this is probably to their disadvantage, but I would much rather have people admitting their faults than hiding them. If everyone did this the world would be better.

Of course the best solution is to not have faults but that is like saying that the solution to being poor is to have lots of money. It's much easier to say than do.

The bad thing is their engineering culture and not anything technical. We all make mistakes, the question is how we fix them. Look a the last sentences of the post:

> Yes we should have done more testing. Yes we shouldn't have copy pasted code. Yes we shouldn't have pushed directly to main. Regardless, I don't regret the experience.

None of those are unconditionally bad! Every project I've worked on could use more testing; we all copy-pasted code at least occasionally, and pushing to main is fine in some circumstances.

The real problem is that they went live, but their tooling (or knowledge how to use it) was so bad it took 5 days to solve the simple issue; and meanwhile, they kept pushing new code ("10-20 commits/day") while their customers were suffering. This is what really causes the reputation hit.

These criticisms about engineering PR are too heavy handed. Great engineers solve problems and describe problems without finger pointing to place blame. In fact I think that the worst engineers I’ve worked with are the ones most often reaching for someone to place it on.

The fact that they couldn't find it by looking at error logs is weird to me.

This is an entirely forgivable error but should have been found the first time they got an email about it:

"Oh, look, the error logs have a duplicate key exception for the primary key, how do we generate primary keys.... (facepalm)"

Funnily enough, I saw the error in their snippet as soon as I read it but dismissed it thinking there was some new-fangled python feature which allowed that to work like the function def defines that default= accepts only functions so the function gets passed? -- I haven't kept up with modern python and that sounds cool and I figured the bug couldn't be THAT simple.

Guess: the logs were on an ec2 instance that was thrown away regularly, and the overnight reports didn't give reproduce steps or timestamps; so when they checked it "works fine".

There's value in having your backtrace surfaced to end users rather than swallowing an exception and displaying "didn't work".

it was on some temporary AWS service like lambda or something? (We had eight ECS tasks on AWS, all running five instances of our backend), but, regardless logs should be somewhere persistent.

If they weren't, that should be the first thing you fix.

Yeah this is not a good thing to advertise.

- They were under large time constraints, but decided a full rewrite to a completely different stack was a good idea.

- They copy-pasted a whole bunch of code, tested it manually once locally, once in production, and called it a day.

- The debugging procedure for this issue so significant it made them dread waking up involved... testing it once and moving on. Every day.

The bug is pretty easy to miss, but should also be trivial to diagnose the moment you look at the error message, and trivial to reproduce if you just try more than once.

> trivial to reproduce if you just try more than once

A lot more than once: they had 40 instances of their app, and the bug was only triggered by getting two requests on the same instance.

A bunch of developers including me once spent a whole weekend trying to reproduce a bug that was affecting production and/or guess from the logs where to look for it. Monday morning, team lead called a meeting, asked for everything we could find out, and… Opened the app in six tabs simultaneously and pressed the button in question in one of the tabs. And it froze! Knowing how to reproduce on our computers, we found and fixed the bug in the next 30 minutes.

I'd rather they admit a mistake and learn a lesson from it even if it isn't a good thing to advertise. That said, I agree that you are identifying a more important issue here but I also think you are being a bit too subtle about even if I agree with what you are saying. The real lesson that they should have learned from this ordeal is to never push code directly into production --- period. The article never mentions using a testbed or sandbox beforehand, and I kinda feel like they learned a good lesson but it may in fact be the wrong lesson to learn here.

I don't see how testbed/sandbox would have helped, unless they'd also have a dedicated QA person _and_ configured their sandbox so have dramatically fewer instances.

Because I can see "create a new subscription" in the manual test plan, but not "create 5x new subscription".

> configured their sandbox so have dramatically fewer instances

May be a wise thing to do anyway. I would even advocate for the extreme, one instance in sandbox. I've seen quite a variety of bugs that would be easier to detect and debug with one instance — from values accidentally persisted between requests (OP is one example of this very broad class of bugs), to a thundering herd bug catched in staging due to less instances and more worker threads per instance. But can't remember even one "distributed" bug caught in staging.

> they'd also have a dedicated QA person

This is a new feature, and it's user-facing, business-critical and inherently fragile (external service integration, though it broke for another reason). I hope multiple people manually run such things end to end before deploying to prod, even if none of them are dedicated QA?

In a way, it does give you an opportunity to think about what you appreciate in a detailed postmortem - not just a single cause, but human and organizational factors too, and an attempt to figure out explicit mitigations. I’ll admit the informality and the breezy tone here made me go “woah, they’re a bit cavalier…”

CEO thoughts: "Oh post-morten are always well received, I should write one for that very really basic bug we had and how we took 5 days to find it, and forget to mention how we fix it or how we have changed our structure so that it never happen again"

Also the CEO: "remember to be defensive on reddit comments saying how we are a small 1 million dollar backed startup and how it's normal do to this king of rookies mistake to be fast."

They spent 5 days. The bug type is pretty common and could easily be done by a developer. (It's a similar class to the singleton default argument issue that many people complain about) Meh, I don't mind the cautionary tale and don't think chatgpt was even relevant.

It's actually a tricky bug, because usual tests wouldn't catch it (db wiped for good isolation) and many ways of manual testing would restart the service (and reset the value) on any change and prevent you from seeing it. Ideally there would be a lint that catches this situation for you.

TBH, while I definitely could see this being an easy bug to write, something is definitely wrong if it took 5 days to identify the root cause of this bug.

That is, I'm struggling to understand how a dive into the logs wouldn't show that all of these inserts were failing with duplicate key constraint violations. At that point at least I'd think you'd be able to narrow down the bug to a problem with key generation, at which point you're 90% of the way there.

I also don't agree that "usual tests wouldn't catch it (db wiped for good isolation)". I'd think that you'd have at least one test case that inserted multiple users within that single test.

The bug was in multiple subscriptions not just users. And I can't think of one non-contrived reason to do it. Even when testing the visibility/access of subscriptions between users you need 2 users, but only one subscription.

create a subscription for a test user. delete it. Make sure you can create another subscription for the same user.

create subscriptions with and without overlapping effective windows

Those seem like very basic tests that would have highlighted the underlying issue

Or add some debug logging? 5 days into a revenue-block bug, if I can't repro manually or via tests, I would have logged the hell out of this code. No code path or metric would be spared.

I'd argue that a suite of tests that exercise all reasonably likely scenarios is table stakes. And would have caught this particular bug.

I'm not talking about 100% branch coverage, but 100% coverage of all happy paths and all unhappy paths that a user might reasonably bump into.

OK maybe not 100% of the scenarios the entire system expresses, but pretty darn close for the business critical flows (signups, orders, checkouts, whatever).

Sure, hindsight is 20/20, but a bunch of these comments are replying to the assertion "And I can't think of one non-contrived reason to do it" (have a single test case with multiple subscriptions). That's the assertion I think is totally weird - I can think of tons of non-contrived reasons to have multiple subscriptions in a single test case.

I wouldn't pillory someone if they left out a test case like this, but neither would I assert that a test case like this is for some reason unthinkable or some outlandish edge case.

Have a look at stripe API. You don't delete subscriptions. You change them to free plan instead / cancel and then resume later. This for would not result in deletion of the entry. You also can update to change the billing date, so no overlapping subscriptions are needed. Neither test would result in the described bug.

What? This doesn't make any sense:

1. First, if you look at the code they posted, they had the same bug on line 45 where they create new Stripe customers.

2. The issue is not multiple subscriptions per user (again, if you look at the code, you'll see each Subscription has one foreign key user_id column). The problem is if you had multiple subscriptions (each from different users) created from the same backend instance then they'd get the same PK.

Not every user needs a stripe customer. I'm creating the stripe entries only on subscription in my app.

Your second point is true, but I don't see what it changes. Most automated unit/integration testing would just wipe the database between tests and needing two subscribed users in a single test is not that likely.

Yeah, we have quite literally caught bugs like this in 5 minutes in prod not bc we made a mistake, but bc a customer’s internal API made a schema change without telling us and our database constraints and logging protected us.

But it took about 5 minutes, and 4 of those minutes were waiting for Kibana to load.

I read the part where they said they poured through "hundreds of sentry logs" and immediately was like "no you didn't."

This is not an error that would be difficult to spot in an error aggregator, it would throw some sort of constraint error with a reasonable error message.

> db wiped for good isolation

Why? In fact, not having good isolation would have caught this bug. Generate random emails for each test. Why would you test on a completely new db as if that is what will happen in the real world?

It's extremely common. You want to know that objects/rows from one test don't bleed into another by accident. It allows you to write more strict and simpler assetions - like "there's one active user in the database" after a few changes, rather than "this specific user is in the database and active and those other ones are not anymore".

Leaking information between test runs can actually make things pass by accident.

They shouldn't. But they do. We're not perfectly spherical developers and we all make mistakes. Sometimes it's also extremely tricky to figure out what state is leaking, especially is it's an access race issue and happens only for some tests and very rarely. If you haven't seen that happening, you just need to work on larger projects.

I've worked at several large companies. In our e2e tests we did not create isolated dbs per test. If the test failed because of other tests running that's a bug in the test and that person would get automatically @mentioned on slack and they would have to fix the build.

Load testing - yes, but it's not that usual unfortunately. (Even though it should be) Acceptance testing - again, maybe, if they use 20 or so subscriptions in one batch, which may not be the case.

> and could easily be done by a developer

...who didn't know how the ORM they were using worked. That's what makes them look so bad here: nobody knew how it worked, not even at the surface level of knowing what the SQL actually generated by the tool looks like.

In their defense, I find SQLAlchemy syntax quite horrible, and I always have to look up everything. It also got a 2.0 release recently which changes some syntax (good luck guessing which version ChatGPT will use), and makes the process even more annoying.

SQLAlchemy syntax is ridiculously obvious and straightforward as long as you're not doing anything weird.

The takeaway here is that they weren't mature enough to realize they were, in fact, doing something "weird". I.e. Using UUIDs for PKs, because hey "Netflix does it so we have to too! Oh and we need an engineering blog to advertise it".

Edit. More clarity about why the UUID is my point of blame: If they had used a surrogate and sequential Integer PK for their tables, they would never have to tell SQLAlchemy what the default would be, it's implied because it's the standard and non-weird behavior that doesn't include a footgun.

Unfortunately, UUID as PK is an extremely common pattern these days, because devs love to believe that they’ll need a distributed DB, and also that you can’t possibly use integers with such a setup. The former is rarely true, the latter is blatantly false.

（评论） (comments)

（评论）
(comments)