Home/Step 4
Step 4: Pilot

Pilot

Build the smallest version of your solution that proves it works. Treat every finding, including the things that fail, as useful evidence about what to do next.

4.1 Define what success looks like before you build

Agree on metrics and collect baseline data before the pilot starts. The metrics should help you answer: How will we know if this worked? Identify how you will measure it and determine impact.

Prove it worked

J-PAL's AI Evidence Playbook provides guidance on building evidence with AI programmes at https://www.povertyactionlab.org/ai-evidence-playbook. A key takeaway: an AI solution is not worth building unless you can show that it effected change. For example, you implemented a flashy new AI model, but did it lead to improved policy outcomes? Ideally, this is determined through randomised evaluations.

Metric What it measures How you will measure it Baseline (before) Target (at 6 months)

What result would make you confident to expand?

4.2 Understand your user

Be specific about who the end users are and what they need the system to do. The more concrete this is, the better your MVP will be.

Persona exercise

What does their working day look like in relation to this problem? Where does time go? What decisions do they make?

Time, errors, stress. What does the current situation take from them, and what would change if it were fixed?

E.g. digital literacy, internet access, trust in the system, or the language the tool works in.

Before you move on

Having mapped the user journey, are there any changes you would make to your MVP? It is much cheaper to change direction here than after you have built something.

4.3 Who is building this?

You don’t necessarily need one person for each role, but the following responsibilities should be covered.

Internal owner

Tap to learn more

Internal owner

Accountable for the project succeeding and for defining the product output. This person keeps the pilot on track and makes the final call on scope.

Technical lead

Tap to learn more

Technical lead

Responsible for building the tool. Needs enough access to data and systems to do the work. They can also help estimate project timelines with technical knowledge.

Subject matter expert

Tap to learn more

Subject matter expert

Knows the problem and the context in depth. Validates that the tool is solving the right thing and that outputs make sense in practice.

User representative

Tap to learn more

User representative

Tests outputs and gives feedback during the pilot. Should be a real end user, not a manager. Their reactions provide valuable evidence for the direction to build in.

Senior sponsor

Tap to learn more

Senior sponsor

Clears institutional and political blockers. Does not need to be involved day-to-day, but must be reachable when the team hits a bottleneck.

If you need external support

Strategies to consider when missing in-house expertise:

Leverage an IGC Evidence Lab

IGC and its Data and AI Factory has experience supporting government AI projects in comparable contexts.

Look across government for cross-ministry support

Another ministry may have already built something similar in a more similar context than a vendor.

Procure services from an outside firm

If you do, insist on knowledge transfer from the beginning. You should be involved and understand the tool being built.

Look for donor support

AI pilots in government can be an attractive frontier area for funding. Frame it around the problem and how the technology will have an impact.

4.4 Development timeline

Set realistic dates.

A note on agile

In the tech world, agile methodologies have become the standard. Going through each step sequentially and waiting for each to be complete before moving on, known as the waterfall methodology, creates bottlenecks and slows delivery. The faster approach is to iterate quickly through the full cycle: build a small piece, test it with real users, learn from what you find, and adjust. Then repeat the cycle. This is why we suggest a minimal version (the MVP) from the beginning, so you can get something out quickly to test and react to. You can find many resources on agile principles across the web.

Iterate
1 Requirements
2 Build
3 Internal testing
4 Deploy
5 Evaluation
1

Requirements

Finalise what the tool needs to do and how success will be measured. These need to be specific items listed out in a document which the developers can follow.

Who leads: Internal owner, with an optional analyst.
2

Build

Develop the first version of the tool.

Who leads: Technical lead.
3

Internal testing

Check it works before showing it to end users.

Who leads: Internal owner and technical lead.
4

Deploy

Deploy to a limited group of real users.

Who leads: Internal owner, technical lead and user representative.
5

Evaluation

Assess results against go / no-go criteria.

Who leads: Internal owner.
In practice

An AI permit-screening tool, three iterations in eight weeks

  1. Cycle 1 Weeks 1–3

    Build: a model that flags incomplete building permit applications before they reach a reviewer. Test: run it alongside staff on two weeks of real submissions. Learn: the model catches missing documents well, but generates too many false positives on non-standard form layouts used by a major local firm.

  2. Cycle 2 Weeks 4–6

    Build: retrain on the firm's layout and add a confidence threshold so borderline cases go to a human rather than auto-reject. Test: open to all incoming applications for one district. Learn: reviewers trust the flags, but ask for a reason code so they can explain decisions to applicants.

  3. Cycle 3 Weeks 7–8

    Build: add plain-language reason codes to each flag and a feedback loop for reviewers to mark errors. Test: roll out across all districts. Learn: average time-to-first-review drops by 40%; reviewer feedback is already improving model accuracy week on week.

A waterfall version of the same project would have spent months defining requirements, and only discovered the layout problem and the need for reason codes after full deployment.

4.5 Define your MVP

Your minimum viable product is the simplest version that delivers the core value. Start here, then add features based on what you learn.

The bicycle principle

A bicycle solves the mobility problem immediately, even if a car is the eventual goal. Your MVP should work and deliver value as the first deployed version, which can then be iterated on.

What to build

The smallest version that genuinely helps

Strip the scope until you can write it in one sentence, keeping the user at the center.

Pilot scope

Which users, geography, or function

Limit the pilot to one team, one office, or one use case. Narrow scope gives you cleaner evidence and fewer variables to explain.

Time limit

Pilot end date and decision point

Set a fixed end date before you start to ensure accountability.

4.6 Budget and resources

Know the numbers before you start. This helps avoid surprises in the build phase, which can cause the pilot to run out of time or money before you can gather any evidence or deploy something.

01

Cost of building

Staff time plus any external costs, such as development, data preparation, and licences.

02

Cost of running for 6 months

This includes compute, hosting, support, and the people maintaining it. Ongoing costs are often underestimated. If the number is fuzzy, get a quote.

03

Cost of maintaining long-term

What it takes to keep the tool working beyond the pilot: updates, monitoring, retraining, and the staff responsible for it.

Identify the source. If it is donor funded, note any conditions attached. Funding uncertainty can derail projects before any technical issues.

4.7 Go / no-go criteria

Agree these criteria before the pilot starts so you know in advance what would lead you to scale, what would lead you to fix, and what would lead you to stop. An honest stop is valuable as it saves money and effort that would otherwise be spent on a project that is not going to work.

Go, scale up

You meet the success metric, staff want to keep using it, and costs are sustainable.

Action:
  • Plan rollout to a new office or use case.
  • Lock in operations and maintenance budget.
  • Document the decision rules and technical decisions.
~
Fix and extend the pilot

There is signal for potential but the result is not yet good enough. You can identify what changes would close the gap.

Action:
  • Define one specific change to test.
  • Set a 4-week limit.
  • Re-run go / no-go after that.
×
Stop, and that is OK

The result does not justify continuing, staff resistance is too high, or data quality blocks progress.

Action:
  • Document what you learned.
  • Inform leadership honestly.
  • Apply lessons to the next pilot.

4.8 Pilot checklist

Go through this checklist before your pilot starts. If any item is not checked, address it first.

Next: Step 5

Run your pilot and measure against your baseline.

When you have evidence, take it to Step 5.

Continue to Step 5