Daily Brief Why we no longer evaluate SWE-bench Verified February 23, 2026 Michelle SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro....
Daily Brief Introducing OpenAI for India February 23, 2026 Arthur OpenAI for India expands AI access across the country—building local infrastructure, powering enterprises, and advancing workforce skills.
Daily Brief OpenAI debated calling police about suspected Canadian shooter’s chats February 23, 2026 Mohsin Jesse Van Rootselaar's descriptions of gun violence were flagged by tools that monitor ChatGPT for misuse.
Daily Brief Exposing biases, moods, personalities, and abstract concepts hidden in February 23, 2026 Michelle A new method developed at MIT could root out vulnerabilities and improve LLM safety and performance....
Daily Brief GGML and llama.cpp join HF to ensure the long-term progress of Local A February 23, 2026 Mohsin ...
Daily Brief Google’s Cloud AI lead on the three frontiers of model capability February 23, 2026 Arthur AI models are pushing against three frontiers at once: raw intelligence, response time, and a third quality you might call "extensibility."...
Daily Brief Anthropic accuses Chinese AI labs of mining Claude as US debates AI ch February 23, 2026 Arthur Anthropic accuses DeepSeek, Moonshot, and MiniMax of using 24,000 fake accounts to distill Claude’s AI capabilities, as U.S. officials debate export controls aimed at slowing China...
Daily Brief Why we no longer evaluate SWE-bench Verified February 23, 2026 Mohsin SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro....
Daily Brief Exposing biases, moods, personalities, and abstract concepts hidden in February 23, 2026 Mohsin A new method developed at MIT could root out vulnerabilities and improve LLM safety and performance....