When Anthropic’s engineers sat down to test their newest model, they weren’t just running benchmarks—they were using it to do their actual jobs. The result, Claude Opus 4.6, represents what the company calls its “smartest model yet,” and the early numbers suggest it might be the most capable coding and reasoning AI released to date. “Claude Opus 4.6 is the strongest model Anthropic has shipped. It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work even when the task is ambitious.” — Notion Engineering Team The Numbers Behind the Upgrade Anthropic is making bold claims about Opus 4.6’s performance, and the evaluation data backs them up. On Terminal-Bench 2.0, a rigorous test of agentic coding capabilities, Opus 4.6 achieved the highest score of any frontier model. On Humanity’s Last Exam—a complex multidisciplinary reasoning test designed to push AI systems to their limits—it leads all competitors. But perhaps the most telling metric is GDPval-AA, which measures performance on economically valuable knowledge work across finance, legal, and other professional domains. There, Opus 4.6 outperforms OpenAI’s GPT-5.2 by approximately 144 Elo points and its own predecessor, Claude Opus 4.5, by 190 points. For context, that’s the difference between a strong club player and a grandmaster in chess. The coding improvements are substantial. Opus 4.6 plans more carefully, sustains agentic tasks for longer periods, operates more reliably in larger codebases, and demonstrates better code review and debugging skills—including the ability to catch its own mistakes. For developers working with complex systems, these aren’t incremental improvements; they change what’s possible. One Million Tokens of Context In a first for Anthropic’s Opus-class models, Opus 4.6 features a 1 million token context window in beta. This isn’t just a specification on a datasheet—it fundamentally changes how the model can be used. Entire codebases, lengthy legal documents, or comprehensive research papers can now fit within a single conversation. The implications extend beyond simple document processing. With this expanded context, Claude can maintain coherence across extended workflows, reference earlier parts of a conversation without losing track, and work with materials that would have been impossible to process in previous generations. “Claude Opus 4.6 represents a meaningful leap in long-context performance. We saw it handle much larger bodies of information with a level of consistency that strengthens how we design and deploy complex research workflows.” — Research Team, Enterprise Partner Agent Teams and Autonomous Work Perhaps the most significant product update is the introduction of agent teams in Claude Code. Users can now assemble multiple AI agents to work on tasks together, with Claude coordinating the effort. Combined with new “compaction” capabilities that allow the model to summarize its own context, this enables longer-running tasks without hitting token limits. The model also introduces “adaptive thinking,” where Claude picks up on contextual clues about how much extended reasoning to apply, and new “effort” controls that give developers granular control over the intelligence-speed-cost tradeoff. For teams building production AI applications, this level of control is essential. Office integration has also received attention. Claude in Excel has been substantially upgraded, and Claude in PowerPoint is now available in research preview. These aren’t gimmicks—they represent Anthropic’s bet that AI’s future lies in augmenting everyday work, not just powering specialized applications. What Early Users Are Saying The early access program has generated enthusiastic feedback from development teams across the industry. At Windsurf, engineers noted that Opus 4.6 “thinks longer, which pays off when deeper reasoning is needed.” A cybersecurity firm reported that across 40 investigations, Opus 4.6 produced the best results 38 times when ranked blindly against Claude 4.5 models. Perhaps most impressively, one enterprise team reported that Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a roughly 50-person organization across 6 repositories. It handled both product and organizational decisions while synthesizing context across multiple domains—and knew when to escalate to a human. The safety profile remains a priority. According to Anthropic’s system card, Opus 4.6 shows an overall safety profile “as good as, or better than, any other frontier model in the industry,” with low rates of misaligned behavior across safety evaluations. For a company that has built its reputation on being more cautious than its rivals, this isn’t surprising—but it’s worth noting as capabilities increase. The Road Ahead Claude Opus 4.6 is available now on claude.ai, the Anthropic API, and all major cloud platforms. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens—meaning users get significantly more capability for the same cost. The release comes at a pivotal moment in the AI race. With OpenAI’s GPT-5.2, Google’s Gemini 3 Deep Think, and now Claude Opus 4.6 all launching within weeks of each other, the frontier is moving fast. What distinguishes Anthropic’s approach is its emphasis on careful reasoning, extended context, and giving users control over how intelligence is applied. For developers and knowledge workers, the question isn’t whether these models will change how they work—it’s how quickly they can adapt to take advantage of capabilities that would have seemed impossible just months ago. This article was reported by the ArtificialDaily editorial team. For more information, visit Anthropic. Related posts: ByteDance backpedals after Seedance 2.0 turned Hollywood icons into AI Anthropic Ships Claude Opus 4.6, Tightening the Race With OpenAI Apple is reportedly cooking up a trio of AI wearables Anthropic releases Sonnet 4.6 Post navigation Anthropic releases Sonnet 4.6 Salesforce rolls out new Slackbot AI agent as it battles Microsoft and