When developers and creators hit a rate limit at the worst possible moment—midway through a coding session or in the flow of generating a video—they face a frustrating choice: stop entirely or wait. OpenAI has spent the past year wrestling with this problem as usage of Codex and Sora consistently exceeded expectations, leaving users bumping against hard stops just as they found real value. The solution the company built represents a fundamental shift in how AI access is managed. Rather than forcing users to choose between rigid limits and unpredictable billing, OpenAI engineered a hybrid system that counts usage in real time and seamlessly transitions between rate limits and purchased credits. “Rate limits can help smooth demand and ensure fair access; however, when users are getting value, hitting a hard stop can be frustrating.” — OpenAI Engineering Team The Access Waterfall: A New Mental Model One of the key conceptual shifts OpenAI made was modeling access as a decision waterfall rather than a binary gate. Instead of asking “is this allowed?”, the system asks “how much is allowed, and from where?” When counting usage, the system moves through a sequence: rate limits are enforced first, then free tiers, then credits, then enterprise entitlements. From a user’s perspective, they don’t “switch systems”—they just keep using Codex and Sora. That’s why credits feel invisible: they’re simply another layer in the waterfall. This approach required building a distributed usage and balance system designed specifically for synchronous access decisions. Every request passes through a single evaluation path that makes a real-time decision about how much usage is allowed, verifying sufficient credits when needed, and returning one definitive outcome while settling debits asynchronously. Why Build In-House? OpenAI evaluated third-party usage billing and metering platforms, but they didn’t meet two critical requirements: immediate knowledge of credit availability and full transparency into every decision. Real-time accuracy matters because best-effort or delayed counting shows up as surprise blocks, inconsistent balances, and incorrect charges. For interactive products like Codex and Sora, those failures become visible and frustrating. Transparency into every outcome—why a request was allowed or blocked, how much usage it consumed, which limits or balances were applied—needed to be tightly integrated into the decision waterfall rather than solved in isolation. “When people are creating or coding, they shouldn’t have to wonder whether a request will go through, if they’ll be overcharged, or whether their balance is accurate.” — OpenAI Engineering Team Provable Correctness Over Speed The system maintains three separate datasets that all tie together: product usage events (what the user actually did), monetization events (what to charge), and balance updates (how much to adjust the credit balance). These datasets aren’t casual by-products; they actually drive the system, with each dataset triggering the next. Separating what occurred, any associated charges, and what was debited allows independent auditing, replaying, and reconciling of every layer. This is an intentional trade-off where OpenAI prioritizes provable correctness at the cost of credit balance updates being slightly delayed. When that brief delay causes overshooting a user’s credit balance, the system automatically refunds it—choosing correctness and user trust over strict enforcement. Architecture in Service of Momentum The guiding principle behind the approach is protecting user momentum. Every architectural decision maps back to a user-facing outcome: real-time balances prevent unnecessary interruptions, atomic consumption prevents double-charging, and unified access logic ensures predictable behavior. The result is that people can work longer, explore more deeply, and take projects further without facing hard stops or premature plan changes. When users are engaged, the system helps them continue rather than getting in the way. This infrastructure change signals a broader evolution in how AI companies think about access. As these tools become essential to workflows, the model of hard cutoffs becomes increasingly untenable. The question for the industry is whether other providers will follow OpenAI’s lead in building seamless, trust-centered access systems—or stick with the old model of artificial scarcity. This article was reported by the ArtificialDaily editorial team. For more information, visit OpenAI. Related posts: ByteDance backpedals after Seedance 2.0 turned Hollywood icons into AI Anthropic Ships Claude Opus 4.6, Tightening the Race With OpenAI Apple is reportedly cooking up a trio of AI wearables Apple is reportedly cooking up a trio of AI wearables Post navigation Personalization features can make LLMs more agreeable Apple is reportedly cooking up a trio of AI wearables