Anthropic released Claude Sonnet 5, the latest Sonnet-class model. Although it’s not a frontier-model breakthrough, Sonnet 5 meaningfully upgrades performance over previous models to deliver stronger coding capabilities, better agentic performance, and more efficient token usage.
Anthropic’s announcement emphasized agentic performance, specifically the model’s ability to carry out multi-step work with less direct human guidance. Anthropic says Sonnet 5 can make plans, use tools such as browsers and terminals, and operate autonomously at a level that recently required larger, more expensive models.
Sonnet 5 Is More Economical With Tokens
Anthropic shows that Sonnet 5 improves over 4.6 with lower-price options and higher quality. Opus 4.8 still beats Sonnet 5 for accuracy, but Anthropic says that the effort level can be adjusted to find the best balance between cost and performance. There is also an introductory price for Sonnet 5 of $2/MTok input and $10/MTok output through August 31.
Sonnet 5 Performance Benchmarks
Sonnet 5 beats Sonnet 4.6, GPT-5.5 and Gemini 3.5 Flash across a number of benchmarks.
The BrowseComp tests how well an AI agent can locate difficult to find information on the web.
BrowseComp scores:
- Claude Sonnet 5: 84.7 (single agent)
- Claude Sonnet 4.6: 76.2
- GPT-5.5: 84.4
Terminal-Bench 2.1 is a test of an AI model’s ability with coding tasks in terminal and CLI.
Terminal-Bench 2.1 scores:
- Claude Sonnet 5: 80.4
- Claude Sonnet 4.6: 67.0
- GPT-5.5: 83.4 (Codex CLI)
- Gemini 3.5: Flash 76.2
SWE-bench Pro is a software engineering benchmark in which Sonnet 5 outperformed other similar LLMs.
SWE-bench Pro scores:
- Claude Sonnet 5: 63.2
- Claude Sonnet 4.6: 58.1
- GPT-5.5: 58.6
- Gemini 3.5 Flash: 55.1
FrontierCode is a benchmark for agentic coding across 150 tasks, a benchmark that Sonnet 5 significantly outperformed GPT-5.5.
The Claude Sonnet 5 System Card explains:
“Each task gives the agent a checked-out repository and a single issue description; the agent then works autonomously in a containerized environment to produce a final patch, with no human intervention and no timeout information.
Patches are graded against blocking functional criteria (primarily held-out unit tests) plus weighted rubric criteria, including model-graded checks for required test coverage and prohibited implementation patterns. Tasks were authored by maintainers of the underlying repositories and individually reviewed by Cognition researchers, with a random subset manually solved toverify fairness.”
The FrontierCode scores:
- Claude Sonnet 5: 38.8
- Claude Sonnet 4.6: 15.1
- GPT-5.5: 25.5
Sonnet 5 Is “Near-Opus Intelligence”
Anthropic does not claim that Sonnet 5 is a frontier model breakthrough, although it does say that it’s their most capable Sonnet-class model. The system card explains that it is less capable than Anthropic’s more capable Opus and Mythos models. Yet Anthropic does claim that it is “near-Opus intelligence at Sonnet pricing…
Source link
Disclaimer
We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.
Website Upgradation is going on for any glitch kindly connect at [email protected]