Anthropic launches new tool to review AI-generated code
Anthropic rolled out a new AI tool called Code Review on Monday to identify bugs before they enter the software codebase.
Peer feedback has long been essential in coding, helping developers catch errors, maintain consistency across a codebase, and improve overall software quality. At the same time, the rise of “vibe coding” (AI tools that generate code from plain-language instructions) has accelerated development but also brought new bugs, security risks, and code that is hard to understand.

“Code review has become a bottleneck, and we hear the same from customers every week,” Anthropic said in a blog post. “They tell us developers are stretched thin, and many PRs [pull request] get skims rather than deep reads.”
Pull requests are used by developers to submit code changes for review before the updates are merged into the main software.
Code Review is Anthropic’s solution to the problem. The company notes that it is a more thorough option, albeit a more expensive one, compared to the open-source Claude Code GitHub Action, which also reviews code and remains available.
How Code Review works
“When a PR opens, Claude dispatches a team of agents to hunt for bugs,” the company said in an X post.
The agents then look for bugs in parallel, filter out false positives, and rank bugs by severity, Anthropic said in the blog post. The result lands on the PR as a single high-signal overview comment (a summary highlighting the most important findings), plus in-line comments (comments attached directly to the specific lines of code where bugs were found) for specific bugs.
“Reviews scale with the PR. Large or complex changes get more agents and a deeper read; trivial ones get a lightweight pass. Based on our testing, the average review takes around 20 minutes,” Anthropic said in the blog post.
The system focuses on logic errors rather than style issues, giving developers actionable insights,Cat Wu, Anthropic’s head of product, told TechCrunch.
“This is really important because a lot of developers have seen AI automated feedback before, and they get annoyed when it’s not immediately actionable,” Wu said. “We decided we’re going to focus purely on logic errors. This way we’re catching the highest priority things to fix.”
The AI also explains its reasoning step by step, showing what it believes the issue is, why it could be a problem, and how it might be fixed, TechCrunch said. Issues are colour-coded by severity: red for the most serious, yellow for potential concerns worth checking, and purple for preexisting or historical bugs.
Results from testing
Anthropic said that it has been using Code Review internally for several months.
On large PRs (pull requests with over 1,000 lines changed), 84% show problems, averaging 7.5 issues. On small PRs (under 50 lines), only 31% show problems, averaging 0.5 issues. Engineers mostly agree with the results: less than 1% of findings are wrong, Anthropic said.
Cost and control
Code Review optimises for depth, which makes it more expensive than lighter-weight alternatives, including the Claude Code GitHub Action. Reviews are billed based on token usage, usually averaging $15–25 per PR, depending on its size and complexity.
Admins have multiple tools to manage costs and usage:
Availability
Code Review is available now as a research preview in beta for Team and Enterprise plans.
Peer feedback has long been essential in coding, helping developers catch errors, maintain consistency across a codebase, and improve overall software quality. At the same time, the rise of “vibe coding” (AI tools that generate code from plain-language instructions) has accelerated development but also brought new bugs, security risks, and code that is hard to understand.
“Code review has become a bottleneck, and we hear the same from customers every week,” Anthropic said in a blog post. “They tell us developers are stretched thin, and many PRs [pull request] get skims rather than deep reads.”
Pull requests are used by developers to submit code changes for review before the updates are merged into the main software.
Code Review is Anthropic’s solution to the problem. The company notes that it is a more thorough option, albeit a more expensive one, compared to the open-source Claude Code GitHub Action, which also reviews code and remains available.
How Code Review works
“When a PR opens, Claude dispatches a team of agents to hunt for bugs,” the company said in an X post.
The agents then look for bugs in parallel, filter out false positives, and rank bugs by severity, Anthropic said in the blog post. The result lands on the PR as a single high-signal overview comment (a summary highlighting the most important findings), plus in-line comments (comments attached directly to the specific lines of code where bugs were found) for specific bugs.
“Reviews scale with the PR. Large or complex changes get more agents and a deeper read; trivial ones get a lightweight pass. Based on our testing, the average review takes around 20 minutes,” Anthropic said in the blog post.
The system focuses on logic errors rather than style issues, giving developers actionable insights,Cat Wu, Anthropic’s head of product, told TechCrunch.
“This is really important because a lot of developers have seen AI automated feedback before, and they get annoyed when it’s not immediately actionable,” Wu said. “We decided we’re going to focus purely on logic errors. This way we’re catching the highest priority things to fix.”
The AI also explains its reasoning step by step, showing what it believes the issue is, why it could be a problem, and how it might be fixed, TechCrunch said. Issues are colour-coded by severity: red for the most serious, yellow for potential concerns worth checking, and purple for preexisting or historical bugs.
Results from testing
Anthropic said that it has been using Code Review internally for several months.
On large PRs (pull requests with over 1,000 lines changed), 84% show problems, averaging 7.5 issues. On small PRs (under 50 lines), only 31% show problems, averaging 0.5 issues. Engineers mostly agree with the results: less than 1% of findings are wrong, Anthropic said.
Cost and control
Code Review optimises for depth, which makes it more expensive than lighter-weight alternatives, including the Claude Code GitHub Action. Reviews are billed based on token usage, usually averaging $15–25 per PR, depending on its size and complexity.
Admins have multiple tools to manage costs and usage:
- Monthly organisation caps: Set a total spend for all reviews in a month
- Repository-level control: Run reviews only on chosen repositories
- Analytics dashboard: Track which PRs were reviewed, acceptance rates, and total review costs
Availability
Code Review is available now as a research preview in beta for Team and Enterprise plans.
- For admins: Enable Code Review in Claude Code settings, install the GitHub App, and select the repositories you want to monitor.
- For developers: Once enabled, reviews run automatically on new PRs without additional setup.
Next Story