When Julius Adebayo began his PhD at MIT, he set out to solve a problem that has plagued artificial intelligence researchers for years: understanding why deep learning models make the decisions they do. That work, which started with a widely cited 2018 paper showing existing interpretability methods were unreliable, has culminated in something remarkable. This week, his San Francisco startup Guide Labs open sourced Steerling-8B—a language model designed from the ground up to be transparent about its reasoning. “The kind of interpretability people do is neuroscience on a model, and we flip that. What we do is actually engineer the model from the ground up so that you don’t need to do neuroscience.” — Julius Adebayo, CEO of Guide Labs The Architecture of Transparency Steerling-8B isn’t just another large language model. Its architecture includes what Guide Labs calls a “concept layer”—a built-in mechanism that buckets data into traceable categories. Every token the model produces can be traced back to its origins in the training data, whether that’s a simple factual reference or something as complex as the model’s understanding of humor or gender. This approach requires more upfront data annotation than traditional training methods. But by using other AI models to assist with the labeling process, Guide Labs has managed to create their largest proof of concept yet—a model that demonstrates interpretability doesn’t have to come at the cost of capability. Traceable reasoning represents a fundamental shift in how we think about AI systems. Rather than treating models as black boxes to be probed and tested after the fact, Guide Labs has built transparency into the foundation. The implications extend far beyond academic curiosity. From Science to Engineering Guide Labs claims Steerling-8B achieves approximately 90% of the capability of today’s widely used models while using less training data. That efficiency gain comes directly from the novel architecture, which doesn’t waste parameters on opaque representations that can’t be understood or controlled. “This model demonstrates that training interpretable models is no longer a sort of science; it’s now an engineering problem. We figured out the science and we can scale them.” — Julius Adebayo The team tracks what they call “discovered concepts”—emergent behaviors the model develops on its own, like an understanding of quantum computing. This suggests the architecture preserves the generalization capabilities that make large language models so powerful, while adding a layer of accountability that’s been missing from the field. Real-World Applications The use cases for truly interpretable AI extend across industries. For consumer-facing applications, model builders could block copyrighted materials or better control outputs around sensitive subjects like violence or drug abuse. The ability to understand exactly what influences a model’s decisions opens doors that have been closed since the deep learning revolution began. Regulated industries may prove to be the earliest adopters. Financial institutions evaluating loan applications need models that consider financial records without introducing bias based on race or other protected characteristics. Healthcare providers need AI systems that can explain their diagnostic reasoning. Legal and compliance teams have been waiting for technology that can satisfy regulatory scrutiny. Scientific research represents another frontier. Protein folding has been a major success for deep learning, but scientists need insight into why their software arrived at particular conclusions. An interpretable model doesn’t just provide answers—it provides the reasoning behind them. The Road Ahead Guide Labs emerged from Y Combinator and raised a $9 million seed round from Initialized Capital in November 2024. The company’s next steps include building larger models and offering API and agentic access to users who need transparent AI systems. The broader question is whether the industry will follow. Current training methods have produced remarkably capable systems, but at the cost of opacity. As these models are entrusted with increasingly consequential decisions, the demand for interpretability will only grow. Adebayo frames the stakes clearly: as we build systems that will be super intelligent, we don’t want something making decisions on our behalf that remains mysterious to us. Steerling-8B represents a bet that transparency and capability can coexist—and that the future of AI belongs to systems we can understand. This article was reported by the ArtificialDaily editorial team. For more information, visit TechCrunch. Related posts: New J-PAL research and policy initiative to test and scale AI innovati AI is already making online crimes easier. It could get much worse. Anthropic launches Cowork, a Claude Desktop agent that works in your f New J-PAL research and policy initiative to test and scale AI innovati Post navigation Study: AI chatbots provide less-accurate information to vulnerable use Study: AI chatbots provide less-accurate information to vulnerable use