Research Methods in Impact Evaluation

Overview of the three main methodological traditions in impact evaluation -quantitative, qualitative, and mixed methods- explaining what each approach does, what questions it answers, and how they work together to generate rigorous and policy-relevant evidence.

An IPA survey in Burkina Faso in 2022 (© IPA)

Impact evaluation draws on three broad methodological traditions: quantitative, qualitative, and mixed methods research. This page explains what each approach does, what kinds of questions it answers, and how they work together to generate evidence that is both rigorous and useful for policy.

TipKey Takeaways
  • Quantitative methods measure how much a program changes an outcome; qualitative methods explain why and how; mixed methods combine both to produce more complete evidence.
  • No single method answers all research questions. Choosing the right approach depends on the question, the stage of the research cycle, and the resources available.
  • The strongest impact evaluations treat qualitative and quantitative work as equally integral components, planned together from the outset rather than added sequentially.

Why method choice matters

Every research question implies a method. When a team asks whether a conditional cash transfer (CCT) program increased school enrollment, the question calls for a quantitative design capable of estimating a causal effect. When a team asks why enrollment increased for girls but not for boys in the same program, the question calls for qualitative inquiry into the mechanisms behind that pattern. When a team wants both answers, it needs both methods, coordinated so that findings from each inform the other.

Choosing the wrong method for a question wastes resources and can produce misleading results. A survey with closed-ended response categories cannot reveal the locally meaningful reasons behind a behavioral pattern. A set of focus group discussions cannot establish whether a program caused an outcome. Understanding what each approach is designed to do -and what it cannot do- is a foundational competency for anyone working in research and evaluation.

Quantitative research

Quantitative research collects and analyzes numerical data to measure outcomes, estimate relationships, and test hypotheses. The defining feature of a rigorous quantitative impact evaluation is a credible counterfactual: an estimate of what would have happened to participants had the program not existed. Without a counterfactual, observed changes in outcomes cannot be attributed to the program rather than to other factors such as economic trends, seasonal variation, or the characteristics of the people who selected into the program.

A randomized controlled trial (RCT) randomly assigns units -individuals, households, schools, or communities- to a treatment group that receives the program or to a control group that does not. Random assignment ensures that the two groups are comparable on average, both on observable characteristics such as income and education, and on unobservable characteristics such as motivation and social connections. The difference in outcomes between the two groups at the end of the study is an unbiased estimate of the program’s average causal effect.

RCTs are considered the strongest design for causal inference when properly implemented. They are most appropriate when a program is being evaluated for the first time, when there is genuine uncertainty about whether the program works, and when the scale of operations allows for randomization. IPA has conducted hundreds of randomized evaluations since its founding in 2002.

Best for Estimating whether and how much a program caused an outcome
Typical outputs Average treatment effect, cost-effectiveness estimates, subgroup effects
Limitations Require sufficient sample size; not always feasible at full scale or when randomization is ethically or politically constrained
NoteCommon criticisms of RCTs

RCTs face recurring critiques: that they are too expensive, that withholding treatment from a control group is unethical, and that findings from one context cannot travel to another. These concerns are worth understanding, and in many cases, responding to. See the Addressing Common RCT Criticisms page for a fuller discussion.

When randomization is not feasible, quasi-experimental methods use statistical techniques to construct a credible comparison group from non-randomly assigned data. Common approaches include difference-in-differences, regression discontinuity, matching, and instrumental variables. Each relies on specific assumptions that must hold for the estimates to be valid.

  • Difference-in-differences compares changes over time between participants and nonparticipants, assuming the two groups would have followed parallel trends absent the program.
  • Regression discontinuity compares those just above and below an eligibility cutoff, requiring that program rules are administered without manipulation.
  • Matching pairs participants with similar nonparticipants, requiring that all relevant differences are captured in observable data.
  • Instrumental variables uses a variable that predicts program participation but affects outcomes only through that participation.
Best for Estimating causal effects when randomization is not feasible
Typical outputs Causal effect estimates with explicit assumptions; sensitivity analyses
Limitations Validity depends on assumptions that cannot always be fully verified

Qualitative research

Qualitative research is a set of flexible research techniques that seeks to understand and interpret social phenomena through experiences, meanings, and subjective perspectives. Rather than measuring how much something changed, qualitative research investigates how and why things happen the way they do. It produces knowledge from a relatively small number of individual or collective units of analysis and is not designed to produce statistically representative measurement.

A common misconception is that qualitative methods are simply interviews and focus groups. Qualitative research is better understood as an approach to acquiring knowledge that follows systematic, reflexive, and rigorous procedures, not a collection of anecdotes or a report annex.

The most common qualitative methods in IPA research are summarized below. As a general rule: FGDs are better suited for collective issues; IDIs are better suited for individual and sensitive topics.

Common qualitative methods and their primary uses
Method Best for
Focus group discussions (FGDs) Shared norms, community perceptions, collective decision-making
In-depth interviews (IDIs) Individual experiences, sensitive topics, sustained probing
Observations Behavioral outcomes, program implementation fidelity
Ethnographies Deep contextual understanding over time

Mixed methods research

Mixed methods research combines quantitative and qualitative approaches within a single study or research program. The combination is not simply additive: each approach informs and strengthens the other in ways that produce evidence more complete than either method alone could generate.

Mixed methods are used in several ways: to accompany RCTs and understand the mechanisms driving change, to conduct process evaluations, to refine instruments, to better understand context, and to develop projects through needs identification.

NoteA common misconception

Qualitative and quantitative research are not opposites on a rigor hierarchy. Decision-makers respond to both rigorous numbers and the contextual richness of detailed descriptions. At the same time, qualitative findings cannot substitute for causal evidence when the question is whether a program produced a measurable effect.

IPA uses primarily two integration strategies, sequential and simultaneous, distinguished by the chronology of data collection.

Qualitative data collection and analysis come first and lead into quantitative data collection and analysis. This design is used to inform variables or hypotheses, characterize the sample and the logistics for reaching it, and design or pilot quantitative instruments.

Example: Before designing a household survey on a nutrition program, a research team conducts focus group discussions to understand how community members define adequate feeding practices and what barriers households face. The qualitative findings shape the survey instrument, ensuring that quantitative measures reflect locally valid concepts.

Quantitative data collection and analysis come first and lead into qualitative data collection and analysis. This design is used to understand contextual factors that explain quantitative results, identify mechanisms producing change, validate hypotheses, and investigate implementation problems with treatments, trainings, or surveys.

Example: An RCT finds that a cash transfer program improved nutritional outcomes for children under two but not for older children. Follow-up in-depth interviews with caregivers explore how families allocated resources across children of different ages, helping explain the age-differentiated pattern in the quantitative results.

Qualitative and quantitative data collection happen in parallel and are compared during interpretation. This design is appropriate for complex topics, longitudinal research, or studies involving diverse populations. It is also used for triangulation and cross-validation of findings.

Example: A process evaluation of a teacher training program collects classroom observation data and student test scores during the same period. The observation data document whether teachers apply new pedagogical techniques, while the test scores measure learning outcomes. Analyzed together, the two data streams reveal both whether the program works and which implementation components drive results.

How qualitative methods contribute across the research cycle

The stage of the research cycle shapes which qualitative contribution is most valuable. The table below shows how qualitative methods contribute at each phase of a program’s scaling pathway.

Qualitative contributions across the program scaling pathway
Phase Primary qualitative contribution
Ideate Understanding population needs; designing context-grounded theories of change; sensitizing technical teams to program implications for daily life
Refine Strengthening learning agendas; mapping implementation challenges; developing contextualized measurement systems
Test Refining measurement instruments through cultural validation; identifying unobservable variables; understanding mechanisms of program effect
Adapt Translating evidence from other settings to the local context
Scale Understanding motivations and concerns of decision-makers regarding program scaling

Choosing the right approach

The table below matches common research questions to methodological approaches. In practice, most evaluations benefit from combining methods.

Matching research questions to methodological approaches
Research question Suggested approach
Did this program cause a change in outcomes? Quantitative (RCT or quasi-experimental)
How large was the effect, and for whom? Quantitative
Why did the program produce this effect (or not)? Qualitative or mixed (sequential explanatory)
How is the program being implemented in practice? Qualitative (observation) or monitoring data
What outcomes matter most to participants? Qualitative (exploratory)
How do participants understand and experience the program? Qualitative (FGDs or IDIs)
Effect size and mechanisms together Mixed methods
Instrument design before a survey Mixed (sequential exploratory)
Validation across data sources Mixed (simultaneous)

The Goldilocks principle: Right-fit evidence

The goal is not to maximize methodological rigor in the abstract, but to collect data that will actually inform decisions,at a cost proportionate to the value of the information (Gugerty and Karlan 2018). An RCT is not always the right tool, and qualitative research is not a fallback for when randomization is impossible. The strongest evidence systems combine approaches deliberately, with each method chosen because it fits its specific question.

For a fuller treatment of this principle and how it applies to monitoring and evaluation system design, see the Monitoring, Evaluation and Learning page.

References

Gugerty, Mary Kay, and Dean Karlan. 2018. The Goldilocks Challenge: Right-Fit Evidence for the Social Sector. New York: Oxford University Press.

Karlan, Dean, Robert Osei, Isaac Osei-Akoto, and Christopher Udry. 2014. “Agricultural Decisions after Relaxing Credit and Risk Constraints.” Quarterly Journal of Economics 129 (2): 597–652. https://doi.org/10.1093/qje/qju002.

Kitzinger, Jenny. 1995. “Qualitative Research: Introducing Focus Groups.” BMJ 311 (7000): 299–302. https://doi.org/10.1136/bmj.311.7000.299.

Additional Resources

Abdul Latif Jameel Poverty Action Lab (J-PAL). “Introduction to Randomized Evaluations.” J-PAL Research Resources. https://www.povertyactionlab.org/resource/introduction-randomized-evaluations.

Back to top