I added AI code review to my GitHub Actions workflow six months ago, and it's changed how I work — not by replacing human review, but by catching categories of issues I'd typically miss in self-review: edge cases, security antipatterns, and performance issues.
In 2025, the AI code review market has several mature options: CodeRabbit, Sourcegraph Cody, GitHub Copilot Code Review, Amazon CodeGuru, and custom setups using the LLM APIs directly. I use a custom setup with Claude Sonnet — it costs about $8-15/month based on my PR volume.
Based on six months of production use, AI code review consistently catches: security vulnerabilities (missing input validation, SQL injection risks, hardcoded secrets), error handling gaps, performance antipatterns (N+1 queries, missing database indexes), and TypeScript type safety issues. It consistently misses: business logic correctness, architectural concerns, and anything requiring domain context.
The workflow: a GitHub Action triggers on pull_request events. The action fetches the PR diff using the GitHub API, sends the diff to the LLM with a code review prompt, and posts the response as a PR comment using octokit. The whole workflow runs in under 90 seconds for typical PRs and costs $0.20-0.50 per review.
AI Code Review — GitHub Actions Workflow
PR Opened / Updated
│
▼
┌───────────────────────────────────────┐
│ GitHub Action Triggered │
│ On: pull_request (opened, sync) │
└──────────────┬────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ Fetch PR Diff (GitHub API) │
│ Filter: skip generated files, │
│ skip lock files │
└──────────────┬────────────────────────┘
│
┌───────┴────────┐
▼ ▼
api/ middleware/ components/ utils/
(Claude Sonnet) (Claude Haiku)
│ │
└───────┬────────┘
▼
┌───────────────────────────────────────┐
│ LLM Review with structured prompt │
│ Returns JSON: [{ │
│ file, line, severity, │
│ category, description, suggestion │
│ }] │
└──────────────┬────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ Post PR Comment (octokit) │
│ CRITICAL → Block merge label │
│ MAJOR → Requires acknowledgment │
│ MINOR → Informational only │
└───────────────────────────────────────┘The most important prompt engineering decision for AI code review is being specific about what you want the model to focus on. I use a tiered severity system in my prompt: critical (security vulnerabilities — always block the PR), major (logic errors — require acknowledgment), and minor (style suggestions — informational only).
After many iterations, my review prompt has converged on: a role definition, a structured output format (JSON with findings array), explicit guidance on what to ignore, and a reminder to cite the specific code that triggered each finding. Structured output is critical — unstructured review text is hard to parse and display in GitHub comments.
Large PRs (1,000+ lines changed) can cost $2-5 to review at full length. I implemented a priority router: files in api/ and middleware/ get full Claude Sonnet review, other TypeScript files get Claude Haiku, and everything else is skipped. This reduced my per-review cost by 65%.
# .github/workflows/ai-code-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Fetch PR diff
id: diff
uses: actions/github-script@v7
with:
script: |
const diff = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: context.payload.pull_request.number,
mediaType: { format: 'diff' }
})
return diff.data
- name: Run AI Review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
node scripts/ai-review.js --diff "${{ steps.diff.outputs.result }}" --pr "${{ github.event.pull_request.number }}"
# Full script: https://gist.github.com/yourusername/ai-reviewFor security review, I run a separate specialized review pass triggered only on PRs that touch authentication, authorization, API endpoints, or database queries. This specialized review uses Claude Opus for the model and a prompt focused exclusively on OWASP Top 10 vulnerabilities. Security findings are also sent to a Slack channel for visibility.
Without tuning, AI code review generates 30-40% false positives — findings that are technically accurate but not actionable in your codebase. These create alert fatigue: developers stop reading the AI review because it's full of noise. Invest time in a suppression list. My suppression list reduced false positives from 35% to under 10%.
AI code review works best when the team has agreed on its role: AI review is a first pass that catches objective issues, not a replacement for human judgment on architecture and business logic.
My actual costs: ~$10-15/month for API calls on ~50 PRs/month. In six months, AI review caught 14 issues I would have missed in self-review: 2 security vulnerabilities, 5 missing error handling cases, 4 TypeScript type safety issues, and 3 performance problems. Conservative estimate: the AI review pays for itself with one caught bug per quarter.