Daily Brief Exposing biases, moods, personalities, and abstract concepts hidden in February 24, 2026 Michelle A new method developed at MIT could root out vulnerabilities and improve LLM safety and performance....
Products OpenAI announces Frontier Alliance Partners February 24, 2026 Michelle OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments....
Daily Brief Why we no longer evaluate SWE-bench Verified February 24, 2026 Michelle SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro....
Funding Nous Research’s NousCoder-14B is an open-source coding model landing r February 24, 2026 Michelle Nous Research, the open-source artificial intelligence startup backed by crypto venture firm Paradigm, released a new competitive programming model on Monday that it says matches o...
Funding Guide Labs debuts a new kind of interpretable LLM February 24, 2026 Michelle The company open sourced an 8-billion-parameter LLM, Steerling-8B, trained with a new architecture designed to make its actions easily interpretable.
Daily Brief Anthropic accuses Chinese AI labs of mining Claude as US debates AI ch February 24, 2026 Michelle Anthropic accuses DeepSeek, Moonshot, and MiniMax of using 24,000 fake accounts to distill Claude’s AI capabilities, as U.S. officials debate export controls aimed at slowing China...
Research Study: AI chatbots provide less-accurate information to vulnerable use February 24, 2026 Michelle Research from the MIT Center for Constructive Communication finds leading AI models perform worse for users with lower English proficiency, less formal education, and non-US origin...
Daily Brief With AI, investor loyalty is (almost) dead: At least a dozen OpenAI VC February 24, 2026 Michelle While some dual investors are understandable, others were more shocking, and signal the disregard of a longstanding ethical conflict-of-interest rule....
Daily Brief Why we no longer evaluate SWE-bench Verified February 24, 2026 Michelle SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro....
Daily Brief Microsoft’s new gaming CEO vows not to flood the ecosystem with ‘endle February 24, 2026 Michelle Is Microsoft's gaming division doubling down on AI?