What production hardening actually means for an AI-built app

AI-assisted development has compressed the time from idea to working prototype dramatically. A technical founder can now ship something that looks and feels like a real product in days rather than months. That’s genuinely useful.

The problem is that the gap between “it works in a demo” and “it works in production” hasn’t compressed at all. The same failure modes that have always existed (such as thin error handling, fragile data models, no observability) are still there. They’re just arriving faster, in codebases that nobody has read all the way through.

This is what I mean when I talk about production hardening. It’s not a rebrand of “code review.” It’s a specific set of problems that reliably show up in quickly-built applications, and a systematic way of finding and fixing them before they cost you.

The six things most likely to break

1. Error handling that stops at the happy path

AI-generated code tends to handle the case it was asked about. The 404. The successful login. The form submission that works. What it rarely handles well: the database connection that times out, the third-party API that returns a 500, the file upload that’s three times larger than expected.

The fix is not to add try/except everywhere. It’s to identify the external dependencies in your application — that is, anything that can fail independently of your own code — and make sure each one has an explicit failure mode that doesn’t take down the request or expose a stack trace to the user.

2. Data integrity without constraints

Relational databases are good at enforcing data integrity. Foreign key constraints, not-null constraints, unique constraints: these are guarantees that live at the database level and protect you regardless of what the application layer does.

Vibe-coded applications frequently skip these. The data model works when the application code works. When it doesn’t work: a race condition, a bug in a background job, a direct database operation during debugging. You get corrupt data that’s expensive to clean up and harder to trust.

3. No observability

If you don’t know what’s happening in your application, you’re flying blind. This means structured logging (not print()), error tracking that surfaces exceptions with context, and basic performance metrics so you know when something is slow before your users tell you.

This doesn’t require an expensive monitoring stack. A logging configuration that writes structured JSON, Sentry on the free tier, and response time logging on your routes covers the majority of what you need at early scale.

4. Authentication surface area

Auth is one of the areas where AI-generated code is most likely to be subtly wrong.

Not obviously broken. It will probably work. But it may miss things: sessions that don’t expire, password reset tokens that don’t invalidate after use, endpoints that check authentication but not authorisation.

A focused auth review covers the login and registration flows, the session lifecycle, the password reset flow, and a sample of protected endpoints. It takes a few hours and it’s almost always worth doing.

5. The data model that’s started to fight back

Early-stage applications often have a data model that was designed for the first feature and then extended, repeatedly, without stepping back. The result is a schema that technically works but requires increasingly convoluted queries, has columns whose purpose is unclear, and would need a significant migration to support the next major feature.

This is worth addressing before it calcifies. Refactoring a data model when you have one hundred users is painful. Refactoring it when you have ten thousand is very expensive.

6. No deployment process

Shipping directly to production, environment variables stored in a shared document, no staging environment, database migrations run by hand. These are common in early-stage applications and each one is a liability.

The fix is a repeatable deployment process: environment-specific configuration, migrations that run as part of the deployment, and at minimum a staging environment where you can verify a change before it hits production.

What to do about it

If your application was built quickly and is approaching the point where reliability matters — you’re onboarding real users, you’re raising a round, you’re handing it to a team — a production hardening review is the right next step.

It’s not a rewrite. It’s a prioritised list of what’s actually a problem, what can wait, and a plan for fixing the things that matter. Most applications come out the other side with a month or two of targeted work.

If that’s where you are, get in touch.