What does AI code review actually catch, and what does it miss?

Based on six months of production use, AI code review consistently catches security vulnerabilities (missing input validation, SQL injection risks, hardcoded secrets), error handling gaps, performance antipatterns like N+1 queries, and TypeScript type safety issues. It consistently misses business logic correctness, architectural concerns, and anything requiring domain context — those still require human review.

How much does this AI code review setup cost to run?

The custom Claude Sonnet setup costs approximately $8–15 per month based on a volume of around 50 PRs per month, with each review costing $0.20–0.50 for typical PRs. Large PRs of 1,000+ lines changed can cost $2–5 to review at full length, which is why a priority router was implemented to selectively use Claude Haiku for lower-risk files, cutting per-review cost by 65%.

How do you reduce AI code review false positives?

Without tuning, AI code review generates 30–40% false positives — technically accurate findings that aren't actionable in a given codebase, which leads to alert fatigue. Building a suppression list that documents intentional patterns and adds them to the prompt as examples to ignore reduced false positives from 35% to under 10%.

Why is structured JSON output important for the review prompt?

The review prompt requests a JSON response with a findings array, where each finding includes file, line, severity, category, description, and a suggestion. Structured output is critical because unstructured review text is hard to parse and display cleanly in GitHub PR comments, and it also makes it straightforward to apply the tiered severity system (critical, major, minor) that controls how each finding is surfaced to the team.

How is the security-focused review different from the general AI review?

The security review is a separate workflow pass triggered only on PRs that touch authentication, authorization, API endpoints, or database queries. It uses Claude Opus (a more capable model) and a prompt focused exclusively on OWASP Top 10 vulnerabilities, with findings also sent to a Slack channel for broader visibility beyond the PR itself.

AI-Powered Code Review with GitHub Actions: My Setup

I added AI code review to my GitHub Actions workflow six months ago, and it's changed how I work — not by replacing human review, but by catching categories of issues I'd typically miss in self-review: edge cases, security antipatterns, and performance issues.

The Landscape of AI Code Review Tools

In 2025, the AI code review market has several mature options: CodeRabbit, Sourcegraph Cody, GitHub Copilot Code Review, Amazon CodeGuru, and custom setups using the LLM APIs directly. I use a custom setup with Claude Sonnet — it costs about $8-15/month based on my PR volume.

What AI Code Review Actually Catches

Based on six months of production use, AI code review consistently catches: security vulnerabilities (missing input validation, SQL injection risks, hardcoded secrets), error handling gaps, performance antipatterns (N+1 queries, missing database indexes), and TypeScript type safety issues. It consistently misses: business logic correctness, architectural concerns, and anything requiring domain context.

GitHub Actions Integration Architecture

The workflow: a GitHub Action triggers on pull_request events. The action fetches the PR diff using the GitHub API, sends the diff to the LLM with a code review prompt, and posts the response as a PR comment using octokit. The whole workflow runs in under 90 seconds for typical PRs and costs $0.20-0.50 per review.

AI Code Review — GitHub Actions Workflow

  PR Opened / Updated
        │
        ▼
  ┌───────────────────────────────────────┐
  │  GitHub Action Triggered              │
  │  On: pull_request (opened, sync)      │
  └──────────────┬────────────────────────┘
                 │
                 ▼
  ┌───────────────────────────────────────┐
  │  Fetch PR Diff (GitHub API)           │
  │  Filter: skip generated files,        │
  │           skip lock files             │
  └──────────────┬────────────────────────┘
                 │
         ┌───────┴────────┐
         ▼                ▼
  api/ middleware/   components/ utils/
  (Claude Sonnet)    (Claude Haiku)
         │                │
         └───────┬────────┘
                 ▼
  ┌───────────────────────────────────────┐
  │  LLM Review with structured prompt   │
  │  Returns JSON: [{                     │
  │    file, line, severity,              │
  │    category, description, suggestion  │
  │  }]                                   │
  └──────────────┬────────────────────────┘
                 │
                 ▼
  ┌───────────────────────────────────────┐
  │  Post PR Comment (octokit)           │
  │  CRITICAL → Block merge label        │
  │  MAJOR    → Requires acknowledgment  │
  │  MINOR    → Informational only       │
  └───────────────────────────────────────┘

The most important prompt engineering decision for AI code review is being specific about what you want the model to focus on. I use a tiered severity system in my prompt: critical (security vulnerabilities — always block the PR), major (logic errors — require acknowledgment), and minor (style suggestions — informational only).

The Review Prompt That Works

After many iterations, my review prompt has converged on: a role definition, a structured output format (JSON with findings array), explicit guidance on what to ignore, and a reminder to cite the specific code that triggered each finding. Structured output is critical — unstructured review text is hard to parse and display in GitHub comments.

Handling Large PRs Without Breaking the Bank

Large PRs (1,000+ lines changed) can cost $2-5 to review at full length. I implemented a priority router: files in api/ and middleware/ get full Claude Sonnet review, other TypeScript files get Claude Haiku, and everything else is skipped. This reduced my per-review cost by 65%.

# .github/workflows/ai-code-review.yml
name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Fetch PR diff
        id: diff
        uses: actions/github-script@v7
        with:
          script: |
            const diff = await github.rest.pulls.get({
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: context.payload.pull_request.number,
              mediaType: { format: 'diff' }
            })
            return diff.data

      - name: Run AI Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          node scripts/ai-review.js             --diff "${{ steps.diff.outputs.result }}"             --pr "${{ github.event.pull_request.number }}"
# Full script: https://gist.github.com/yourusername/ai-review

Security-Focused Reviews

For security review, I run a separate specialized review pass triggered only on PRs that touch authentication, authorization, API endpoints, or database queries. This specialized review uses Claude Opus for the model and a prompt focused exclusively on OWASP Top 10 vulnerabilities. Security findings are also sent to a Slack channel for visibility.

Without tuning, AI code review generates 30-40% false positives — findings that are technically accurate but not actionable in your codebase. These create alert fatigue: developers stop reading the AI review because it's full of noise. Invest time in a suppression list. My suppression list reduced false positives from 35% to under 10%.

Integrating with Your Review Culture

AI code review works best when the team has agreed on its role: AI review is a first pass that catches objective issues, not a replacement for human judgment on architecture and business logic.

Cost Analysis and ROI

My actual costs: ~$10-15/month for API calls on ~50 PRs/month. In six months, AI review caught 14 issues I would have missed in self-review: 2 security vulnerabilities, 5 missing error handling cases, 4 TypeScript type safety issues, and 3 performance problems. Conservative estimate: the AI review pays for itself with one caught bug per quarter.

Frequently Asked Questions

AI-Powered Code Review with GitHub Actions: My Setup

Frequently Asked Questions

AI-Powered Code Review with GitHub Actions: My Setup

The Landscape of AI Code Review Tools

What AI Code Review Actually Catches

GitHub Actions Integration Architecture

The Review Prompt That Works

Handling Large PRs Without Breaking the Bank

Security-Focused Reviews

Integrating with Your Review Culture

Cost Analysis and ROI

Sources & Further Reading

Related Articles

The Landscape of AI Code Review Tools

What AI Code Review Actually Catches

GitHub Actions Integration Architecture

The Review Prompt That Works

Handling Large PRs Without Breaking the Bank

Security-Focused Reviews

Integrating with Your Review Culture

Cost Analysis and ROI

Sources & Further Reading

Related Articles