Spring Cleaning Your Vibe Coded Apps
I've been building a fitness tracker app on and off for about a year. It started as a weekend vibe coding project — Next.js 14, Prisma, PostgreSQL, deployed on Vercel. Over the months I bolted on Peloton API sync, Tonal API integration, OCR-based screenshot import via Tesseract.js, goals with quarterly milestones, dark mode, the works. Classic vibe coded app: functional, personal, messy.
It works. I use it every day. And until last week, I hadn't really looked at the code in months.
Then I decided to point a current-generation AI agent at it — not to add features, but to audit and refactor what was already there. The results were humbling, educational, and genuinely useful. Five PRs later, the app is measurably better in ways I wouldn't have prioritized on my own.
This post is about what I found, how I structured the work, and why I think every vibe coded app deserves a spring cleaning.
The Setup
The app is a standard Next.js stack: App Router, Prisma ORM, PostgreSQL on Neon, NextAuth v5 beta for Google login, Tailwind CSS. It has API routes for CRUD, two third-party API integrations (Peloton and Tonal), and a Tesseract.js-powered OCR pipeline for importing workout screenshots. About 4,000 lines of application code across 30-ish files.
The agent doing the audit was Claude Opus 4.6 running through Coder's agentic development environment — same setup I use for this blog. Full filesystem access, shell access, GitHub CLI pre-authenticated. I gave it a simple prompt: audit the codebase systematically, find issues, fix them in phased PRs.
The Ground Rules
I didn't want one massive PR that changed everything. I've been burned by that before — hard to review, hard to revert, hard to know what broke what.
Instead we agreed on a structure:
- Chunked PRs, one phase at a time
- I merge each one via
gh pr merge --squash --delete-branch - Vercel auto-deploys on push to main
- I test the live site between phases before greenlighting the next one
This turned out to be the right call. Phase 4 introduced a subtle data-layer bug that would have been invisible in a 2,000-line mega-PR.
Phase 1: Quick Wins
The first pass found the kind of stuff that accumulates in any project you don't actively maintain:
- No PrismaClient singleton. Every API route was creating a new database connection. In development this causes the "too many connections" warning. In production on a serverless platform like Vercel, it's wasteful at best.
- Missing database indexes. The
WorkoutSessiontable had no indexes ondate,source, orpelotonId— columns used in every query. - Array mutation bug. A
sort()call was mutating state directly instead of spreading first. React doesn't detect mutations. - Hardcoded year.
new Date().getFullYear()was correct, but a default value elsewhere was hardcoded to 2025. - Dead code. Unused imports, unreachable functions, a commented-out migration route.
None of these were individually catastrophic. Together they represented the kind of entropy that makes a codebase progressively harder to work with.
Phase 2: Security Hardening
This was the one that made me uncomfortable. The app was behind Google OAuth, so I'd never thought hard about defense in depth. The agent found:
- No auth checks on API routes. NextAuth middleware was handling the gate, but individual API routes had no secondary validation. If middleware ever failed or was misconfigured, every route was wide open.
- No input validation. API routes trusted whatever the client sent. No length limits, no type checking, no sanitization.
- Verbose error messages. Stack traces and internal details were leaking to the client in error responses.
- No security headers. No CSP, no X-Frame-Options, no Referrer-Policy.
- Database migration route accessible in production. A
/api/migrateendpoint existed with no environment check.
The fix added blocking getSession() auth checks to all API routes, email allowlisting via ALLOWED_EMAIL, security headers in next.config.ts, Zod-based input validation on all write endpoints, and a production block on the migrate route.
And then it broke the entire app.
NextAuth v5 beta's auth() returns null in Route Handlers on Vercel — even for authenticated users. It's a cookie context limitation. The blocking 401 checks we just added were rejecting every API call, including from logged-in users. The dashboard loaded but couldn't fetch any data.
The hotfix was immediate: replace the blocking auth checks with a non-blocking checkAuth() that logs a warning but doesn't return 401. Middleware remains the real auth gate. The defense-in-depth intent is still there — if someone bypasses middleware, the logs will show it — but the app doesn't break when NextAuth's Route Handler session resolution is flaky.
This is worth being honest about. The agent's security recommendation was textbook correct: every API route should verify authentication independently. But it didn't account for a known limitation of NextAuth v5 beta's Vercel deployment. The fix was fast, but for a few minutes the app was completely down.
Phase 3: Component Refactor
The main dashboard component — WorkoutDashboard.tsx — was 1,316 lines. It handled state management, API calls, OCR processing, goal calculations, settings, and all the UI rendering. Classic vibe code: everything in one file because it was easier to keep building than to stop and organize.
The agent extracted:
src/types/workout.ts— shared TypeScript interfacessrc/utils/goalsApi.ts— goal API helperssrc/utils/tonalOCR.ts— OCR parsing functionssrc/components/SettingsModal.tsx— settings UIsrc/components/DashboardHeader.tsx— header with sync buttons and year selector
The main component went from 1,316 to 734 lines. Ten other files that were importing types from WorkoutDashboard got updated to import from @/types/workout. Re-exports were added for backward compatibility so nothing broke.
This is the phase where the AI agent's strength really showed. Refactoring a 1,300-line component requires understanding every dependency, every import chain, every prop flow. It's exactly the kind of tedious, high-attention work that humans procrastinate on and agents handle methodically.
Phase 4: Performance
This phase had the most interesting findings — and the most interesting bug.
Batch sync queries. The Peloton and Tonal sync routes were checking whether each workout already existed in the database one at a time. For a full sync of 100+ workouts, that's 100+ individual findFirst queries. The fix batch-fetches all already-synced IDs for the current page in a single query.
Year filtering. The workouts API was returning every workout in the database, and the client was filtering by year. Added a ?year= query parameter so the database does the filtering. Also discovered that Next.js was caching the API response — added export const dynamic = 'force-dynamic' to prevent stale data.
Dashboard memoization. Added useMemo on filtered sessions and current goal, useCallback on event handlers, and an AbortController on the data-loading effect with proper cleanup.
The debug query that crashed production. While investigating why year filtering wasn't working, the agent added a prisma.$queryRawUnsafe call to log the year distribution of workouts in the database. Reasonable idea. Except $queryRawUnsafe isn't available in Prisma's edge runtime on Vercel, so it crashed the entire /api/workouts endpoint. Another self-inflicted outage during the spring cleaning itself. The fix was just deleting the debug line, but it's a reminder that even diagnostic code can break things if you don't test it in the actual deployment environment.
The Tonal API bug. This one was hiding in plain sight. The Tonal API response format had changed at some point — it returns a raw JSON array (not { data: [...] }), and uses different field names (activityId instead of id, workoutPreview as a nested object instead of flat fields). The sync route was silently failing to map any Tonal data. It had probably been broken for months. I wouldn't have found this without someone methodically reading through the API integration code.
The year display bug. Two components — MonthlySummary and ProgressChart — had their own internal const currentYear = new Date().getFullYear() instead of using the year prop from the parent. When you switched to view 2025 data, the charts still showed 2026 labels. Another one that was hiding in plain sight.
Phase 5: Accessibility
The final phase was a full accessibility audit. Three modals with no ARIA attributes, no keyboard handling, and no focus traps. Twelve form inputs without programmatic label associations. Seven icon-only buttons with no accessible names. Button groups acting as radio selectors with no role or aria-checked semantics.
The fixes were surgical:
- All 3 modals got
role="dialog",aria-modal,aria-labelledby, Escape key handlers, and Tab/Shift+Tab focus traps - All 12 form inputs got
htmlFor/idlabel pairing - All 7 icon-only buttons got
aria-label - 5 settings button groups got
role="radiogroup"andaria-checked - An
alert()call got replaced with the existing error banner pattern
This is another category where AI agents excel. Accessibility auditing requires checking every interactive element against a known set of rules. It's comprehensive, repetitive, and easy to miss things when you're doing it manually. The agent found every instance across 6 files in one pass.
What the Agent Found That I Wouldn't Have
Looking back across all five phases, there's a pattern. The issues fall into three categories:
Things I knew were wrong but hadn't prioritized. The giant component, the missing indexes, the lack of input validation. These were in my mental backlog but never rose to the top because the app worked fine.
Things I didn't know were wrong. The Tonal API field name changes, the year display bug, the Cloudflare-style "it works but it's silently broken" issues. These required reading code I hadn't touched in months with fresh eyes.
Things I wouldn't have thought to check. The accessibility audit, the security headers, the force-dynamic export. These require domain knowledge that I have in theory but don't apply consistently to side projects.
The agent brought all three — the discipline to do what I'd been putting off, the fresh perspective to catch what I'd stopped seeing, and the domain knowledge to check what I'd forgotten to consider.
The Numbers
| Phase | PR | Files Changed | Key Metric |
|---|---|---|---|
| Quick Wins | #12 | 8 | PrismaClient singleton, 3 DB indexes added |
| Security | #13 | 6 | Auth checks on all routes, input validation, security headers |
| Component Refactor | #14 | 14 | 1,316 → 734 lines in main component |
| Performance | #15 | 9 | N+1 queries eliminated, Tonal API bug fixed |
| Accessibility | #16 | 6 | 3 modals, 12 inputs, 7 buttons, 5 radiogroups fixed |
5 PRs. 5 phases. ~40 files touched. Two self-inflicted outages along the way.
Why This Matters for Vibe Coding
Vibe coding is great for building things fast. I built this fitness tracker in a weekend and have been using it daily for a year. That's a genuine success story. But vibe coded apps accumulate debt faster than traditionally developed ones because the builder (me, you, anyone) is optimizing for velocity, not maintainability.
The models available today — not last year's models, but the ones shipping right now — are good enough to audit your old vibe coded projects and find real issues. Not theoretical concerns. Real bugs, real security holes, real performance problems that were hiding in code you stopped actively reading months ago.
The cost is low. This entire audit — five phases across a week of part-time work — probably consumed $15-20 in API tokens. The alternative was letting those issues compound until something actually broke in production, or until the codebase became so tangled that adding features felt painful.
How to Do Your Own Spring Cleaning
If you have vibe coded apps running in production (or even just apps you built quickly and stopped maintaining), here's the playbook:
-
Start with a fresh clone and a current model. Don't use the model that built the app. Use whatever's newest. The gap between models 6-12 months apart is significant for code comprehension tasks.
-
Phase the work. Don't try to fix everything in one PR. Group changes by category: quick wins, security, architecture, performance, accessibility. Merge and test between each phase.
-
Let the agent find things you forgot about. The most valuable findings in my audit weren't the ones I already knew about. They were the silent failures — API response formats that changed, display bugs in components I wasn't looking at, auth assumptions that were never tested.
-
Check the boring stuff. Security headers, input validation, database indexes, accessible names on buttons. These are the things that never make it onto a feature backlog but determine whether your app is actually solid.
-
Don't skip the build. After every change, build the project. Type errors and import issues surface immediately. Every one of our five PRs passed
next buildbefore merging.
The Meta Lesson
The fitness tracker works the same way it did before the audit. A user wouldn't notice any difference. But the codebase is meaningfully better — more secure, better organized, more performant, more accessible. The Tonal sync actually works now. The year selector actually works now.
It wasn't a clean sweep, though. We broke the app twice during the spring cleaning — once with auth checks that didn't account for NextAuth v5's Vercel behavior, once with a debug query that crashed the API endpoint. Both were fixed within minutes, but they happened. If I'd been doing this on a higher-traffic app, those minutes would have mattered.
The lesson isn't "let the agent do everything and trust the output." It's "let the agent find things you missed, but test every change in the real environment before moving on." The phased PR approach saved us here. If all five phases had been one PR, the auth breakage would have been tangled up with the component refactor and the performance changes, and debugging would have been miserable.
Every vibe coded app has this layer of accumulated entropy. The models available today are good enough to find it. They're also capable of introducing new problems while fixing old ones — just like a human would. The structure around the work matters as much as the work itself.