Measurement and Survey Design

This guide explores the fundamental principles of measurement and survey design in development research, explaining why accurate measurement matters and how different design choices affect data quality and research outcomes.

Key Takeaways

Measurement quality directly impacts the validity and reliability of research findings and policy recommendations.
Theory of Change serves as the foundational roadmap that guides measurement strategy and indicator selection.
Understanding measurement error and its sources helps researchers make informed design choices that improve data quality.

Why Measurement Matters

Measurement sits at the heart of empirical research. Every policy recommendation, every claim about program effectiveness, and every insight about poverty reduction depends fundamentally on how well we measure the phenomena we study. Yet as Howard Wainer noted, “Gathering data, like making love, is one of those activities that almost everyone thinks can be done without instructions. The results are usually disastrous.”

The Stakes of Quality Measurement

The stakes of measurement quality are particularly high in development research. When we measure school attendance, health outcomes, or economic well-being, we’re not just collecting data—we’re creating the evidence base that will inform decisions affecting millions of lives. A poorly designed survey question about household income could lead to misallocated resources. An invalid measure of learning outcomes could result in ineffective education policies.

Consider the seemingly simple question of measuring school attendance in Jensen’s (2010) study on returns to education. Should we rely on:

Enrollment records
Daily attendance sheets
Self-reported data

Each choice carries different implications for validity, cost, and feasibility. Administrative records might be more objective but could miss informal schooling. Self-reported data might capture the respondent’s perspective but could be subject to social desirability bias. These aren’t just technical trade-offs—they’re choices that shape what we can learn and how confidently we can make recommendations.

Why Do You Need Data?

Measurement serves several crucial functions in development research:

Getting to know your sample: Understanding population characteristics and testing whether randomization was successful
Reducing variance: Controlling for baseline variables to improve statistical precision
Testing alternative hypotheses: Enabling analysis of heterogeneous effects and spillovers
Solving the black box problem: Measuring at each step of your Theory of Change to understand mechanisms

Understanding Core Measurement Concepts

Before exploring design principles, let’s examine the fundamental building blocks of measurement using Jensen’s (2010)¹ study on returns to education. These four key concepts form the foundation for all measurement decisions:

Construct

The abstract concept you want to measure (e.g., “school attendance”)

Indicator

The specific way you operationalize the construct (e.g., “percentage of school days attended”)

Instrument

The data collection tool (e.g., teacher attendance sheets)

Variable/Data

The actual values you collect (e.g., “85% attendance rate”)

Theory of Change as Your Foundation

Before you measure anything, you should clearly define each concept that is part of your Theory of Change and specify where it fits in the causal chain. This conceptualization process involves:

Clear Definition

Clearly state which dimensions are included in each concept and what is not part of your concept. For example, if measuring “women’s empowerment,” specify whether you include decision-making power, freedom of movement, control over resources, or all three.

Avoid Functional Definitions

Concepts should not be defined by their relationships to other concepts - those relationships should be testable. Don’t define empowerment as “something that leads to better health outcomes.”

Literature Integration

Know how your understanding of each concept fits into ongoing discussions and current literature. Build on established theoretical frameworks rather than inventing from scratch.

Concept ≠ Measurement

Remember that your intelligence is not your IQ score. The concept exists independently of how you measure it.

A Theory of Change (ToC) guides measurement decisions by mapping the causal pathway from intervention to outcomes. It helps researchers:

Identify key measurement points
Understand causal mechanisms
Surface critical assumptions
Prioritize indicators

Consider a conditional cash transfer program:

Immediate outcomes: School enrollment
Mechanisms: Household decisions
Context: School quality
Ultimate impact: Learning outcomes

For more details, see the Theory of Change page.

Understanding Measurement Quality

The quality of any measurement system depends on two fundamental properties that work together but represent distinct concepts: validity and reliability. Understanding the relationship between these concepts is crucial for designing effective measurement strategies.²

Quality measurement requires attention to both accuracy and precision:

Threats to Accuracy

Theoretical Threats

Construct validity: How well does your indicator map to the actual concept? (e.g., Does an IQ test truly measure intelligence?)
Measurement-concept alignment: Are you actually measuring what you think you’re measuring?

Practical Threats (Response Biases)

Recall bias: Respondents’ difficulty remembering past events accurately
Social desirability bias: Tendency to give socially acceptable answers
Anchoring bias: Being influenced by previously presented information
Acquiescence bias: Tendency to agree with statements regardless of content
Framing effects: How question wording influences responses

Threats to Precision

Survey Design Issues

Length and fatigue: Long surveys reduce response quality
Ambiguous wording: Unclear definitions of terms like “household” or “income”
Inappropriate recall periods: Asking about annual spending based on yesterday’s purchases
Response format problems: Poorly designed answer options

Random Error Sources

Question wording: Inconsistent or confusing language
Surveyor quality: Inadequate training or inconsistent administration
Data entry mistakes: Errors in transferring responses to datasets
Survey fatigue: Reduced attention as surveys progress

The Recall Period Trade-off

One critical precision decision involves choosing appropriate time frames for questions. Researchers must balance recall accuracy against response variance:

Longer periods (e.g., 12 months):

Pros: Capture seasonal variation, reduce impact of atypical days
Cons: Greater recall bias, telescoping effects, estimation challenges

Shorter periods (e.g., 7 days):

Pros: Better recall accuracy, more precise estimates
Cons: May not represent typical behavior, higher variance across respondents

Example: Measuring remittance sending behavior:

“How much money have you sent in remittances in the past 12 months?” - captures annual patterns but suffers from recall bias
“How much money have you sent in remittances in the past 3 months?” - balances recall accuracy with pattern capture
“How much money have you sent in remittances in the past 7 days?” - most accurate recall but may miss typical behavior

Best practice: Choose recall periods based on the frequency of the behavior and the importance of accuracy versus representativeness for your research question.

Validity: Measuring What We Intend to Measure

Validity refers to how accurately a measurement captures the concept we want to study. A valid measure truly represents the construct or phenomenon of interest, rather than something else.

Key Types of Validity

Construct Validity: Does the measure reflect the theoretical concept?
- Example: Using test scores to measure learning
Content Validity: Does it cover all relevant aspects?
- Example: A math test covering all required topics
Criterion Validity: Does it correlate with established measures?
- Example: New poverty measure matching World Bank standards

Let’s examine this through a practical example:

Measuring Financial Well-being:

Poor Validity: Using only monthly income
Better Validity: Combining income, savings, and debt measures

Common Validity Threats

Using proxy measures that don’t fully capture the concept
Cultural differences in how concepts are understood
Missing important dimensions of complex constructs
Response bias from sensitive questions

The key is ensuring measurements accurately reflect what researchers intend to study, not just what’s easy to measure.

Reliability: Consistency in Measurement

Reliability refers to how consistently a measure produces similar results under unchanged conditions. While validity ensures we measure the right thing, reliability ensures we measure it consistently.

Key Types of Reliability

Test-Retest Reliability: Same measure, different times
- Example: Asking about income in successive weeks
Inter-rater Reliability: Different enumerators, same subject
- Example: Multiple teachers grading the same test
Internal Consistency: Different items measuring same construct
- Example: Multiple questions about food security

Let’s examine this through practical examples:

Measuring Household Income:

Poor Reliability: “What’s your total income?”
Better Reliability: Breaking down income sources by category and timeframe

Common Reliability Threats

Recall error from long reference periods
Inconsistent question interpretation
Environmental factors affecting responses
Enumerator differences in question delivery

Remember: A measure can be reliable without being valid (consistently wrong), but cannot be valid without being reliable.

Types of Measurement

Development researchers can choose from various measurement approaches:

Measurement Options

Surveys: Household/individual questionnaires, including anthropometric measures
Administrative data: Government records, school enrollment, health facility data
Logs and diaries: Self-reported tracking over time
Qualitative methods: Focus groups, Rapid/Participatory Rural Appraisal (R/PRA)
Behavioral measures: Games and choice problems
Observation: Classroom snapshot surveys, direct observation protocols
Objective assessments: Health and education tests
Digital data: Cell phone applications and social network data

The Psychology Behind Responses

Understanding how respondents process survey questions is crucial for designing effective instruments. Rather than simply retrieving pre-formed answers, respondents engage in a complex cognitive process that can introduce measurement error.³

The Four Stages of Response

Understanding how respondents process questions reveals why seemingly straightforward questions can produce unreliable data. Each survey response involves a complex cognitive sequence:

Comprehension
- Understanding the question’s intent
- Interpreting key terms and concepts
- Determining what information is being requested
Retrieval
- Accessing relevant memories
- Searching for specific information or experiences
- Recalling details from the defined time period
Judgment and Estimation
- Evaluating the completeness and accuracy of retrieved information
- Making estimations when exact recall is impossible
- Organizing thoughts into a coherent answer
Response
- Selecting an appropriate answer from available options
- Formatting the response according to question structure
- Potentially editing answers based on social considerations

Consider the complexity behind this apparently simple question: “How many times did you consume rice this month?” The respondent must:

Comprehension: Define what counts as “consuming rice” (meals vs snacks vs ingredients?)
Retrieval: Recall numerous eating occasions over 30 days
Judgment: Estimate frequency when exact counting is impossible
Response: Select the most appropriate category from provided options

Each stage presents distinct opportunities for measurement error:

Comprehension errors: Ambiguous wording leads to different interpretations across respondents
Retrieval limitations: Respondents may:
- Struggle to recall specific information over long periods
- Retrieve information selectively (more memorable events)
- Experience systematic recall patterns (recency effects)
Judgment biases: Respondents may apply different estimation strategies or criteria
Response editing: Respondents may modify answers based on:
- Social desirability concerns
- Perceived consequences of their responses
- Available response options (anchoring effects)

Understanding this cognitive process helps explain why seemingly straightforward questions can produce unreliable data. For example, when we ask “How much did your household spend on food last month,” we’re asking respondents to:

Define what counts as “household”
Recall numerous transactions
Categorize various expenses
Aggregate multiple amounts

The quality of response depends on how successfully respondents navigate each stage of this complex cognitive process.

Measurement Error

Measurement error—the difference between a respondent’s answer and the true value—can arise from multiple sources. Rather than viewing these as isolated problems, it’s helpful to understand them as systematic patterns that can be anticipated and addressed through careful design.

1. Ambiguity

Problematic: “Do you exercise regularly?”

What counts as “exercise”? What does “regularly” mean?

Improved: “During the past week, how many days did you do at least 30 minutes of physical activity?” Practice: Rewrite this ambiguous question: “How often do you eat healthy food?”

2. Negative Framing

Problem: “Do you disagree that the program was unhelpful?” requires respondents to work through multiple negations.

3. Double-Barreled Questions

Problematic: “How satisfied are you with the quality and price of education?” conflates two potentially different judgments. Fixed: Ask separately:

“How satisfied are you with the quality of education?” “How satisfied are you with the price of education?”

4. Presuming Questions

Problematic: “How much have you saved for your children’s education?” assumes the respondent has children, plans to have children, and believes in saving for education. Better Approach:

“Do you have children or plan to have children?” If yes: “Do you save money for education expenses?” If yes: “Approximately how much have you saved?”

5. Jargon and Technical Terms

Can confuse respondents who aren’t familiar with specialized language, leading to misinterpretation or non-response.

Response Option Design

The way answer choices are structured can systematically bias responses. Poor design of response options can lead to measurement error through issues like missing categories, overlapping ranges, or unbalanced scales. Well-designed response options should be mutually exclusive, collectively exhaustive, and presented in a clear, unbiased format that allows respondents to accurately report their true answers.

Core Principles for Response Options:

Mutually exclusive: No overlap between categories
Collectively exhaustive: Cover all possible answers
Balanced: Equal number of positive and negative options for scales
Appropriate precision: Match the level of detail respondents can reasonably provide

1. Completeness

Response options must cover all possible answers. Missing categories force respondents to choose inappropriate options.

Example Problem: “How many years have you worked full-time at IPA?” This excludes part-time workers.

Better Design:

Include “part-time” option
Add “Not applicable” for contractors
Include “Don’t know” option

2. Overlapping Categories

Response options should be mutually exclusive with no overlap between categories.

Example Problem: Age categories:

0-1 years
1-2 years
2+ years

Where does someone exactly 1-year-old fit?

Better Design:

0 years
1 year
2+ years

Or:

0-11 months
12-23 months
24+ months

3. Anchoring Effects

The structure and range of response options can suggest what’s “normal” and bias answers.

Example Problem: Income categories:

$50,000-$75,000
$75,000-$100,000
$100,000-$200,000

This suggests these are typical income ranges and may bias responses.

Better Design:

Use local income distribution data to set appropriate ranges
Consider open-ended responses for continuous variables

4. Question Order Effects

The sequence of questions and overall survey context can influence how people respond.

Example Problem: Asking about satisfaction with specific services before overall satisfaction biases the overall rating.

Better Design:

Ask general questions before specific ones
Group related questions together
Consider randomizing question order when appropriate

Respondent-Related Biases

Individual respondents bring their own cognitive biases, memory limitations, and personal motivations that can systematically affect how they answer questions. These biases include recall bias (difficulty remembering past events accurately), social desirability bias (tendency to give socially acceptable answers), anchoring bias (being influenced by previously presented information), and reporting bias (strategic misrepresentation based on perceived incentives).

1. Recall Bias

Problematic: “What did you eat for dinner last Monday?” Solutions:

Shorten recall period: “What did you eat for dinner yesterday?”
Use diaries for tracking over time
Use administrative records when available

2. Social Desirability Bias

Sensitive Question: “Have you consumed illegal drugs in the past year?” Mitigation Strategies:

Emphasize confidentiality
Use anonymous data collection
Consider indirect questioning techniques

3. Anchoring Bias

What it is: When question wording or structure suggests what constitutes a “normal” response, influencing how respondents answer.

Example problem: “In the US, Hurricane Harvey left some passengers stranded at airports for up to three days. How long was your last flight delayed?”

No time (there was no delay)
Less than 10 minutes
Between 10 minutes and 30 minutes
More than 30 minutes but less than an hour
An hour or more

The reference to three-day delays creates an anchor that makes respondents more likely to report longer delays.

Better approach: “How long was your last flight delayed?” without the contextual anchor.

Practice Exercise: Identify the bias in this question and suggest an improvement: “Given that regular exercise improves health, how many times per week do you exercise?”

4. Framing Effects

What it is: How the presentation or wording of identical information influences responses, even when the factual content is the same.

Example problem: “Two drugs can be used to treat 600 terminally ill patients. Treatment A will save 200 people. Treatment B will allow 400 people to die. Which treatment would you prefer?”

Despite being mathematically identical (200 saved = 400 die out of 600), respondents systematically prefer Treatment A because it’s framed as a gain rather than a loss.

Better approach: Use neutral, balanced language that doesn’t emphasize gains or losses.

5. Question Order Effects

What it is: The sequence of questions influences how respondents answer later questions by making certain topics more mentally accessible.

Example problem:

“How many years of education do you have?”
“Did you go to public or private school?”
“Did your school provide everyone a quality education?”
“For the upcoming election, what are the top policy priorities you are looking for from the candidates?”

After thinking extensively about education, respondents are more likely to prioritize education-related policies.

Better approach: Ask general questions before specific ones, and consider randomizing question order when appropriate.

6. Reporting Bias

Occurs when respondents have incentives to misrepresent their situation. In contexts where survey responses might affect eligibility for benefits, respondents may strategically under-report income or over-report needs.

Designing Quality Instruments

The SMURF Criteria

Every survey question should meet these essential standards:

Specific: Asks one question at a time, avoiding double-barreled questions
Measurable: Quantifies accurate and unbiased information with appropriate precision
Understandable: Easy to comprehend with clearly defined terms and concepts
Relevant: Measures a key or intermediate outcome connected to your Theory of Change
Framed: Has clear boundaries including time frame, context, and scope

Example of SMURF in practice:

Poor: “How satisfied are you with the quality and price of education?”

Violates Specific (double-barreled: quality AND price)
Not Framed (which education? what time period?)

Better: “How satisfied are you with the quality of education your child received this school year?”

Specific: focuses only on quality
Framed: specifies child’s education and time period

Question Quality Checklist

Before finalizing any question, go through this checklist to identify potential issues:

Purpose and Planning

Does the question have an explicit rationale and a plan for how responses will be used?
Does it pass tests of validity, reliability, and responsiveness?
Can you connect this question to a specific analysis or research hypothesis?

Respondent Perspective

Can respondents easily answer from memory?
Is the question simple, specific, and well-defined enough that all respondents will interpret it similarly?
Are you asking about something respondents actually know or care about?

Bias Prevention

Does the question contain words or phrases that could bias responses?
Does it appear to “give away” what you’d like the response to be?
Are you leading respondents toward particular answers?

Structure and Format

Does the question focus on a single topic, or should it be broken into multiple questions?
Are all response options mutually exclusive?
Should you allow multiple answers or include an “other” option?
Are assumptions implied by the question actually warranted?

Survey Implementation Guidelines

Technical Considerations

Skip logic: Double-check all conditional questions and use consistent visual cues
Hard/soft checks: Limit validation checks primarily to computer-assisted interviews
Visual design: Use consistent formatting and clear instructions for surveyors
Supporting materials: Develop answer cards and visual aids as needed

Human Elements

Rapport building: Standardize the first minute of every interview to establish trust
Documentation: Leave space for comments and field notes
Closure: Always thank respondents and provide contact information
Flexibility: Allow time for respondents’ questions and concerns

From Theory to Practice: Design Exercise

Let’s apply measurement principles to a real scenario demonstrating how theory translates into practice.

Scenario: Evaluating a Microfinance Program

You’re evaluating a microfinance program that provides small loans to women entrepreneurs.

Theory of Change

Microfinance → Business investment → Increased income → Women’s empowerment

Design Challenge

Create survey questions to measure:

Business investment (intermediate outcome)
Women’s empowerment (final outcome)

For each question, consider:

Is it valid for your construct?
Is it reliable?
What measurement errors might occur?
How can you minimize these errors?

Sample Solution

Business Investment Measurement

Poor Design: “Did the loan help your business?”

Better Design: “In the past 3 months, how much money did you invest in your business from the following sources:

Personal savings: $____
Microfinance loan: $____
Other loans: $____
Family/friends: $____”

Women’s Empowerment Measurement

Poor Design: “Do you feel empowered?”

Better Design: “Who in your household makes decisions about:

Daily food purchases: [Me alone/Spouse alone/Together/Other]
Children’s education: [Me alone/Spouse alone/Together/Other]
Large purchases: [Me alone/Spouse alone/Together/Other]”

This approach demonstrates how good measurement design moves from abstract constructs to concrete, measurable indicators while minimizing common sources of error.

IPA Example: Measuring Women’s Empowerment⁴

IPA’s Research Methods Initiative has been at the forefront of developing innovative measurement approaches.⁵ The Global Poverty Research Lab at Northwestern University, in partnership with IPA, has systematically studied how to improve measurement methods across multiple domains.⁶

IPA’s Multi-Dimensional Measurement Approach

Decision-making power
- Household purchases
- Children’s education
- Healthcare decisions
Freedom of movement
- Market access
- Health facility access
- Family visits
Economic resources
- Earnings control
- Credit access
- Asset ownership

Innovation and Context in Modern Measurement

Different research contexts and evolving technologies have transformed how we approach measurement in development research. Understanding this evolution helps researchers make informed choices about measurement strategies.

The Technological Revolution

Recent innovations have fundamentally enhanced our measurement capabilities:

Digital Data Collection: Mobile surveys, satellite data, and remote sensing provide new measurement opportunities
Standardized Modules: Validated question banks and measurement toolkits improve comparability across studies
Administrative Integration: Combining survey data with existing records reduces respondent burden while improving accuracy

For example, Ambler et al.’s (2021) study on remittances and education demonstrates how modern measurement approaches can improve data quality. Their work revealed that combining administrative records with carefully framed survey questions provided more accurate insights into how financial transfers affect educational outcomes.⁷

Balancing Multiple Stakeholder Needs

Successful measurement strategies must balance competing demands:

Researchers need reliable, valid data that supports rigorous analysis
Policymakers require clear, actionable insights that inform decisions
Administrative systems must maintain practical compatibility and feasibility
Study participants deserve respect for their time and privacy

This evolution reflects a broader understanding that measurement is not just a technical challenge but a social and political one that requires careful consideration of context, culture, and consequences.

This comprehensive approach provides more valid measurement than single-item indicators, influencing how organizations measure women’s empowerment globally. :::

Piloting and Testing

Essential Testing Steps

Review before piloting: Check instruments without collecting data
Mandatory piloting: No instrument should be used without testing
Iterative process: Pilot, revise, pilot again
External testing: Pilot outside your target population first
Staff testing: Ensure your team can implement the instrument effectively

Common Mistakes to Avoid

Based on extensive research experience, researchers frequently make these errors:

Including unnecessary concepts: Only measure what directly relates to your Theory of Change and analysis plan
Overly long surveys: Most surveys are simply too long - ruthlessly prioritize
Functional concept definitions: Defining concepts by their expected relationships rather than their intrinsic properties
Ignoring cultural context: Failing to adapt questions to local understanding and norms
Skipping pilot testing: Moving directly to full implementation without adequate testing

References

Bradburn, N. M., Sudman, S., & Wansink, B. (2004). Asking questions: The definitive guide to questionnaire design. Jossey-Bass.
Deaton, A., & Zaidi, S. (2002). Guidelines for constructing consumption aggregates for welfare analysis. World Bank Publications.
Fowler, F. J. (1995). Improving survey questions: Design and evaluation. Sage.
Glewwe, P., & Grosh, M. E. (2000). Designing household survey questionnaires for developing countries. World Bank.
Tourangeau, R., Rips, L. J., & Rasinski, K. A. (2000). The psychology of survey response. Cambridge University Press.

Footnotes

Jensen, R. (2010). The (perceived) returns to education and the demand for schooling. The Quarterly Journal of Economics, 125(2), 515-548.↩︎
Ambler, K., Aycinena, D., and Yang, D. (2021). Channeling remittances to education: A field experiment among migrants from El Salvador. American Economic Journal: Applied Economics, 13(2), 207-235.↩︎
Glennerster, R., & Takavarasha, K. (2013). Running randomized evaluations: A practical guide. Princeton University Press.↩︎
Global Poverty Research Lab at Northwestern University. (n.d.). Methodological Studies in Development Research. https://www.poverty-action.org/researchers/working-with-ipa/research-methods-initiative↩︎
Tourangeau, R. (1984). Cognitive sciences and survey methods. In T. Jabine, M. Straf, J. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey methodology: Building a bridge between disciplines (pp. 73-100). The National Academies Press.↩︎
IPA Research Methods Initiative. (2023). Research Methods Initiative Overview. https://www.poverty-action.org/researchers/working-with-ipa/research-methods-initiative↩︎
Global Poverty Research Lab at Northwestern University. (n.d.). Methodological Studies in Development Research. https://www.poverty-action.org/researchers/working-with-ipa/research-methods-initiative↩︎

Reuse

CC BY 4.0

Why Measurement Matters

The Stakes of Quality Measurement

Why Do You Need Data?

Understanding Core Measurement Concepts

Theory of Change as Your Foundation

Understanding Measurement Quality

Threats to Accuracy

Threats to Precision

The Recall Period Trade-off

Validity: Measuring What We Intend to Measure

Reliability: Consistency in Measurement

Types of Measurement

The Psychology Behind Responses

The Four Stages of Response

Measurement Error

Response Option Design

Respondent-Related Biases

Designing Quality Instruments

The SMURF Criteria

Question Quality Checklist

Survey Implementation Guidelines

Technical Considerations

Human Elements

From Theory to Practice: Design Exercise

Scenario: Evaluating a Microfinance Program

Design Challenge

Sample Solution

Innovation and Context in Modern Measurement

The Technological Revolution

Balancing Multiple Stakeholder Needs

Understanding Measurement as Social Practice

Piloting and Testing

Common Mistakes to Avoid

References

Footnotes

Reuse