When MIT economist Esther Duflo won the Nobel Prize in 2019, she became the youngest person ever—and only the second woman—to receive the award in economics. Her work, alongside husband Abhijit Banerjee and colleague Michael Kremer, transformed how the world thinks about fighting poverty. Rather than relying on intuition or ideology, they championed randomized controlled trials (RCTs) to test what actually works.

Now, that same evidence-based approach is being applied to one of the most hyped technologies of our time: artificial intelligence.

“We’ve seen tremendous enthusiasm for AI in the social sector, but very little rigorous evidence about what actually works,” said Duflo at the initiative’s launch. “Before we scale these technologies across millions of people, we need to know if they’re moving the needle—or just moving money around.”

A $25 Million Bet on Evidence

In February 2024, J-PAL announced the launch of Project AI Evidence—a $25 million, multi-year initiative to test whether AI tools actually deliver on their promises for social good. The program will connect governments, tech companies, and nonprofits with world-class economists at MIT and across J-PAL’s global network to evaluate and improve AI solutions in healthcare, education, agriculture, and public service delivery.

The timing is critical. Governments worldwide are racing to deploy AI systems for everything from healthcare diagnosis to welfare eligibility. Tech companies are pouring billions into AI for development. Yet most of these interventions have never been rigorously tested.

Consider the track record: A 2023 World Bank review found that fewer than 5% of AI projects in developing countries had any formal impact evaluation. Many show promising results in pilot phases but fail when scaled. Others work in controlled settings but falter in messy real-world environments.

“The pattern is familiar,” said J-PAL’s Executive Director, Rachel Glennerster. “We’ve seen it with microfinance, with mobile money, with various education technologies. The initial hype outpaces the evidence. Our goal is to get ahead of that curve with AI—to build the evidence base while these technologies are still being shaped.”

Three Tracks to Better AI

Project AI Evidence will operate across three interconnected tracks:

Research Partnerships will fund and coordinate randomized evaluations of AI interventions across J-PAL’s global network of 900+ affiliated researchers. Priority areas include healthcare diagnostics, adaptive learning platforms, crop disease detection, and fraud detection in social programs.

The Policy Lab will work directly with governments to design and implement AI evaluations. The goal is not just to study what works, but to help policymakers make better decisions about AI procurement and deployment.

“Governments are being sold AI systems every day,” said J-PAL’s policy director. “They need independent, rigorous evidence to separate the wheat from the chaff. We want to be that trusted partner.”

The Innovation Fund—a $5 million pool—will support the development of new AI tools specifically designed for rigorous evaluation.

What Early Research Shows

While Project AI Evidence is just getting started, J-PAL’s existing research offers hints at what rigorous evaluation might reveal about AI’s potential—and limitations.

In education, a J-PAL-affiliated study in India found that an AI-powered personalized learning app improved math scores for students—but only when combined with human teacher support. The AI alone had minimal impact.

In healthcare, research in Kenya showed that AI chatbots could effectively deliver mental health counseling at scale—but struggled to handle complex cases that required human judgment.

In agriculture, a study in Rwanda found that AI-generated weather forecasts were accurate but underutilized because farmers trusted traditional knowledge more than algorithmic predictions.

“These findings don’t mean AI doesn’t work,” emphasized Duflo. “They mean AI works in specific contexts, with specific designs, for specific populations. Our job is to figure out which contexts, which designs, and which populations.”

The Stakes for Global Development

Project AI Evidence arrives at a pivotal moment for AI governance. Regulators in the EU, US, and elsewhere are scrambling to craft rules for AI deployment. The UN has called for global AI standards. Tech companies are making voluntary commitments on AI safety.

J-PAL’s approach offers something different: hard evidence rather than principles, tested interventions rather than promises.

“There’s a risk that AI becomes another technology where the Global South is treated as a testing ground for tools designed elsewhere,” warned Banerjee. “We want to flip that script—to generate evidence from the Global South that shapes how AI is built and deployed everywhere.”

The initiative’s first evaluations are expected to launch in late 2024, with initial results beginning to emerge in 2025. The goal is not just to evaluate individual AI tools, but to build a broader evidence base that can guide the field.

“In ten years, we want to look back and say that AI in the social sector was shaped by rigorous evidence, not just by marketing budgets and press releases,” said Glennerster. “That’s ambitious, but that’s J-PAL’s DNA. We don’t do hype. We do evidence.”

For an organization that has already transformed how the world fights poverty, the question is whether that same evidence-based approach can help AI live up to its promise—or reveal where it falls short.


This article was reported by the ArtificialDaily editorial team. For more information about J-PAL’s research, visit www.j-pal.org.

Leave a Reply

Your email address will not be published. Required fields are marked *