Reviewing AI-Generated Code
A checklist and mental model for reviewing code you didn't write — what to look for when your coding agent hands back a diff.
The agent writes the code. You own it. That means every line it produces is your responsibility — and you need a systematic way to review it.
The Trust Gradient
Not all agent output deserves the same scrutiny. Calibrate review depth by risk:
| Risk Level | Examples | Review Approach |
|---|---|---|
| Low | CSS tweaks, adding a test, formatting | Scan the diff, verify it builds |
| Medium | New component, refactoring, API changes | Read every line, test manually |
| High | Auth logic, data mutations, build config | Read every line, trace the control flow, verify edge cases |
The mistake is treating everything as low-risk. The agent will happily modify your build pipeline with the same confidence it uses to fix a typo.
What Agents Get Wrong
These failure modes appear consistently across projects and models:
Wrong Framework Version
The agent’s training data includes multiple versions of every framework. It will confidently generate Astro 4 patterns when you need Astro 6, or React class components when you use hooks. Check imports and API calls against your actual framework version.
Dependency Creep
Ask for one feature, get three new npm packages. Agents default to installing libraries for things the standard library already handles. Before accepting a new dependency: check if the feature exists natively, check the package size, and check when it was last maintained.
Over-Engineering
A request for “a breadcrumb component” returns a recursive navigation framework with configuration objects and abstract base classes. The agent optimizes for generality; your project needs specificity. If the solution is more complex than the problem, push back.
Inconsistent Patterns
The agent doesn’t remember your conventions between sessions unless CLAUDE.md tells it. It might use camelCase in one file and snake_case in another, or mix async patterns within the same module. Check for consistency with existing code.
Silent Assumptions
The agent makes decisions without flagging them. It might choose a specific caching strategy, pick a default timeout value, or assume a particular database schema. These assumptions are embedded in the code without comments. Read for implicit decisions, not just explicit logic.
The Review Checklist
Run through this for every non-trivial diff:
Does it build?
npm run build # or your equivalentNever merge agent output you haven’t built locally. “It looks right” is not verification.
Does it match the request? Compare what you asked for against what you got. Agents frequently add features you didn’t request, refactor code you didn’t mention, or “improve” things that worked fine.
Does it follow project conventions?
- Correct framework/library versions
- Consistent naming patterns
- Same file organization as existing code
- No new dependencies without justification
Is it the right complexity? Count the files changed. If you asked for a simple feature and the diff touches 12 files, something went wrong. The right solution is usually the smallest one that works.
Are there security concerns?
- User input sanitized?
- No hardcoded secrets?
- No eval() or equivalent?
- API endpoints validated?
Does it handle the edge cases that matter? The agent often adds error handling for impossible states while missing realistic edge cases. Focus on: what happens with empty data, null values, network failures, and concurrent access.
The “Read the Diff” Habit
The most important practice: read every diff before accepting it. Not skim — read.
git diff --staged # what you're about to commitgit diff HEAD~1 # what just landedThis sounds obvious. In practice, after hours of productive agent sessions, the temptation to “just accept and move on” is strong. That’s exactly when bugs slip through.
Warning
The agent’s confidence is not correlated with correctness. It will present broken code with the same certainty as working code. Your review is the only quality gate.
After the Review
If you find a problem the agent should have avoided, add a rule to CLAUDE.md. This is how the constraint system grows — through real failures, not hypothetical ones. Every bug that makes it past review is a missing rule.