The smallest version that genuinely helps
Strip the scope until you can write it in one sentence, keeping the user at the center.
Build the smallest version of your solution that proves it works. Treat every finding, including the things that fail, as useful evidence about what to do next.
Agree on metrics and collect baseline data before the pilot starts. The metrics should help you answer: How will we know if this worked? Identify how you will measure it and determine impact.
J-PAL's AI Evidence Playbook provides guidance on building evidence with AI programmes at https://www.povertyactionlab.org/ai-evidence-playbook. A key takeaway: an AI solution is not worth building unless you can show that it effected change. For example, you implemented a flashy new AI model, but did it lead to improved policy outcomes? Ideally, this is determined through randomised evaluations.
| Metric | What it measures | How you will measure it | Baseline (before) | Target (at 6 months) |
|---|---|---|---|---|
What result would make you confident to expand?
Be specific about who the end users are and what they need the system to do. The more concrete this is, the better your MVP will be.
What does their working day look like in relation to this problem? Where does time go? What decisions do they make?
Time, errors, stress. What does the current situation take from them, and what would change if it were fixed?
E.g. digital literacy, internet access, trust in the system, or the language the tool works in.
Having mapped the user journey, are there any changes you would make to your MVP? It is much cheaper to change direction here than after you have built something.
You don’t necessarily need one person for each role, but the following responsibilities should be covered.
Accountable for the project succeeding and for defining the product output. This person keeps the pilot on track and makes the final call on scope.
Responsible for building the tool. Needs enough access to data and systems to do the work. They can also help estimate project timelines with technical knowledge.
Knows the problem and the context in depth. Validates that the tool is solving the right thing and that outputs make sense in practice.
Tests outputs and gives feedback during the pilot. Should be a real end user, not a manager. Their reactions provide valuable evidence for the direction to build in.
Clears institutional and political blockers. Does not need to be involved day-to-day, but must be reachable when the team hits a bottleneck.
Strategies to consider when missing in-house expertise:
IGC and its Data and AI Factory has experience supporting government AI projects in comparable contexts.
Another ministry may have already built something similar in a more similar context than a vendor.
If you do, insist on knowledge transfer from the beginning. You should be involved and understand the tool being built.
AI pilots in government can be an attractive frontier area for funding. Frame it around the problem and how the technology will have an impact.
Set realistic dates.
In the tech world, agile methodologies have become the standard. Going through each step sequentially and waiting for each to be complete before moving on, known as the waterfall methodology, creates bottlenecks and slows delivery. The faster approach is to iterate quickly through the full cycle: build a small piece, test it with real users, learn from what you find, and adjust. Then repeat the cycle. This is why we suggest a minimal version (the MVP) from the beginning, so you can get something out quickly to test and react to. You can find many resources on agile principles across the web.
Finalise what the tool needs to do and how success will be measured. These need to be specific items listed out in a document which the developers can follow.
Who leads: Internal owner, with an optional analyst.Develop the first version of the tool.
Who leads: Technical lead.Check it works before showing it to end users.
Who leads: Internal owner and technical lead.Deploy to a limited group of real users.
Who leads: Internal owner, technical lead and user representative.Assess results against go / no-go criteria.
Who leads: Internal owner.Build: a model that flags incomplete building permit applications before they reach a reviewer. Test: run it alongside staff on two weeks of real submissions. Learn: the model catches missing documents well, but generates too many false positives on non-standard form layouts used by a major local firm.
Build: retrain on the firm's layout and add a confidence threshold so borderline cases go to a human rather than auto-reject. Test: open to all incoming applications for one district. Learn: reviewers trust the flags, but ask for a reason code so they can explain decisions to applicants.
Build: add plain-language reason codes to each flag and a feedback loop for reviewers to mark errors. Test: roll out across all districts. Learn: average time-to-first-review drops by 40%; reviewer feedback is already improving model accuracy week on week.
A waterfall version of the same project would have spent months defining requirements, and only discovered the layout problem and the need for reason codes after full deployment.
Your minimum viable product is the simplest version that delivers the core value. Start here, then add features based on what you learn.
A bicycle solves the mobility problem immediately, even if a car is the eventual goal. Your MVP should work and deliver value as the first deployed version, which can then be iterated on.
Strip the scope until you can write it in one sentence, keeping the user at the center.
Limit the pilot to one team, one office, or one use case. Narrow scope gives you cleaner evidence and fewer variables to explain.
Set a fixed end date before you start to ensure accountability.
Know the numbers before you start. This helps avoid surprises in the build phase, which can cause the pilot to run out of time or money before you can gather any evidence or deploy something.
Staff time plus any external costs, such as development, data preparation, and licences.
This includes compute, hosting, support, and the people maintaining it. Ongoing costs are often underestimated. If the number is fuzzy, get a quote.
What it takes to keep the tool working beyond the pilot: updates, monitoring, retraining, and the staff responsible for it.
Identify the source. If it is donor funded, note any conditions attached. Funding uncertainty can derail projects before any technical issues.
Agree these criteria before the pilot starts so you know in advance what would lead you to scale, what would lead you to fix, and what would lead you to stop. An honest stop is valuable as it saves money and effort that would otherwise be spent on a project that is not going to work.
You meet the success metric, staff want to keep using it, and costs are sustainable.
Action:There is signal for potential but the result is not yet good enough. You can identify what changes would close the gap.
Action:The result does not justify continuing, staff resistance is too high, or data quality blocks progress.
Action:Go through this checklist before your pilot starts. If any item is not checked, address it first.
When you have evidence, take it to Step 5.
Continue to Step 5 →