Measurement and Survey Design

This guide explores the fundamental principles of measurement and survey design in development research, explaining why accurate measurement matters and how different design choices affect data quality and research outcomes.

TipKey Takeaways
  • Measurement quality directly impacts the validity and reliability of research findings and policy recommendations.
  • Theory of Change serves as the foundational roadmap that guides measurement strategy and indicator selection.
  • Understanding measurement error and its sources helps researchers make informed design choices that improve data quality.

Why Measurement Matters

Measurement sits at the heart of empirical research. Every policy recommendation, every claim about program effectiveness, and every insight about poverty reduction depends fundamentally on how well we measure the phenomena we study. Yet as Howard Wainer noted, “Gathering data, like making love, is one of those activities that almost everyone thinks can be done without instructions. The results are usually disastrous.”

The Stakes of Quality Measurement

The stakes of measurement quality are particularly high in development research. When we measure school attendance, health outcomes, or economic well-being, we’re not just collecting data—we’re creating the evidence base that will inform decisions affecting millions of lives. A poorly designed survey question about household income could lead to misallocated resources. An invalid measure of learning outcomes could result in ineffective education policies.

Consider the seemingly simple question of measuring school attendance in Jensen’s (2010) study on returns to education. Should we rely on:

  • Enrollment records
  • Daily attendance sheets
  • Self-reported data

Each choice carries different implications for validity, cost, and feasibility. Administrative records might be more objective but could miss informal schooling. Self-reported data might capture the respondent’s perspective but could be subject to social desirability bias. These aren’t just technical trade-offs—they’re choices that shape what we can learn and how confidently we can make recommendations.

Why Do You Need Data?

Measurement serves several crucial functions in development research:

  • Getting to know your sample: Understanding population characteristics and testing whether randomization was successful
  • Reducing variance: Controlling for baseline variables to improve statistical precision
  • Testing alternative hypotheses: Enabling analysis of heterogeneous effects and spillovers
  • Solving the black box problem: Measuring at each step of your Theory of Change to understand mechanisms

Understanding Core Measurement Concepts

Before exploring design principles, let’s examine the fundamental building blocks of measurement using Jensen’s (2010)1 study on returns to education. These four key concepts form the foundation for all measurement decisions:

The abstract concept you want to measure (e.g., “school attendance”)

The specific way you operationalize the construct (e.g., “percentage of school days attended”)

The data collection tool (e.g., teacher attendance sheets)

The actual values you collect (e.g., “85% attendance rate”)

Theory of Change as Your Foundation

Before you measure anything, you should clearly define each concept that is part of your Theory of Change and specify where it fits in the causal chain. This conceptualization process involves:

Clearly state which dimensions are included in each concept and what is not part of your concept. For example, if measuring “women’s empowerment,” specify whether you include decision-making power, freedom of movement, control over resources, or all three.

Concepts should not be defined by their relationships to other concepts - those relationships should be testable. Don’t define empowerment as “something that leads to better health outcomes.”

Know how your understanding of each concept fits into ongoing discussions and current literature. Build on established theoretical frameworks rather than inventing from scratch.

Remember that your intelligence is not your IQ score. The concept exists independently of how you measure it.

A Theory of Change (ToC) guides measurement decisions by mapping the causal pathway from intervention to outcomes. It helps researchers:

  1. Identify key measurement points
  2. Understand causal mechanisms
  3. Surface critical assumptions
  4. Prioritize indicators

Consider a conditional cash transfer program:

  • Immediate outcomes: School enrollment
  • Mechanisms: Household decisions
  • Context: School quality
  • Ultimate impact: Learning outcomes

For more details, see the Theory of Change page.

Understanding Measurement Quality

The quality of any measurement system depends on two fundamental properties that work together but represent distinct concepts: validity and reliability. Understanding the relationship between these concepts is crucial for designing effective measurement strategies.2

Quality measurement requires attention to both accuracy and precision:

Threats to Accuracy

  • Construct validity: How well does your indicator map to the actual concept? (e.g., Does an IQ test truly measure intelligence?)
  • Measurement-concept alignment: Are you actually measuring what you think you’re measuring?
  • Recall bias: Respondents’ difficulty remembering past events accurately
  • Social desirability bias: Tendency to give socially acceptable answers
  • Anchoring bias: Being influenced by previously presented information
  • Acquiescence bias: Tendency to agree with statements regardless of content
  • Framing effects: How question wording influences responses

Threats to Precision

  • Length and fatigue: Long surveys reduce response quality
  • Ambiguous wording: Unclear definitions of terms like “household” or “income”
  • Inappropriate recall periods: Asking about annual spending based on yesterday’s purchases
  • Response format problems: Poorly designed answer options
  • Question wording: Inconsistent or confusing language
  • Surveyor quality: Inadequate training or inconsistent administration
  • Data entry mistakes: Errors in transferring responses to datasets
  • Survey fatigue: Reduced attention as surveys progress

The Recall Period Trade-off

One critical precision decision involves choosing appropriate time frames for questions. Researchers must balance recall accuracy against response variance:

Longer periods (e.g., 12 months):

  • Pros: Capture seasonal variation, reduce impact of atypical days
  • Cons: Greater recall bias, telescoping effects, estimation challenges

Shorter periods (e.g., 7 days):

  • Pros: Better recall accuracy, more precise estimates
  • Cons: May not represent typical behavior, higher variance across respondents

Example: Measuring remittance sending behavior:

  • “How much money have you sent in remittances in the past 12 months?” - captures annual patterns but suffers from recall bias
  • “How much money have you sent in remittances in the past 3 months?” - balances recall accuracy with pattern capture
  • “How much money have you sent in remittances in the past 7 days?” - most accurate recall but may miss typical behavior

Best practice: Choose recall periods based on the frequency of the behavior and the importance of accuracy versus representativeness for your research question.

Validity: Measuring What We Intend to Measure

Validity refers to how accurately a measurement captures the concept we want to study. A valid measure truly represents the construct or phenomenon of interest, rather than something else.

NoteKey Types of Validity
  1. Construct Validity: Does the measure reflect the theoretical concept?
    • Example: Using test scores to measure learning
  2. Content Validity: Does it cover all relevant aspects?
    • Example: A math test covering all required topics
  3. Criterion Validity: Does it correlate with established measures?
    • Example: New poverty measure matching World Bank standards

Let’s examine this through a practical example:

Measuring Financial Well-being:

  • Poor Validity: Using only monthly income
  • Better Validity: Combining income, savings, and debt measures
TipCommon Validity Threats
  • Using proxy measures that don’t fully capture the concept
  • Cultural differences in how concepts are understood
  • Missing important dimensions of complex constructs
  • Response bias from sensitive questions

The key is ensuring measurements accurately reflect what researchers intend to study, not just what’s easy to measure.

Reliability: Consistency in Measurement

Reliability refers to how consistently a measure produces similar results under unchanged conditions. While validity ensures we measure the right thing, reliability ensures we measure it consistently.

NoteKey Types of Reliability
  1. Test-Retest Reliability: Same measure, different times
    • Example: Asking about income in successive weeks
  2. Inter-rater Reliability: Different enumerators, same subject
    • Example: Multiple teachers grading the same test
  3. Internal Consistency: Different items measuring same construct
    • Example: Multiple questions about food security

Let’s examine this through practical examples:

Measuring Household Income:

  • Poor Reliability: “What’s your total income?”
  • Better Reliability: Breaking down income sources by category and timeframe
TipCommon Reliability Threats
  • Recall error from long reference periods
  • Inconsistent question interpretation
  • Environmental factors affecting responses
  • Enumerator differences in question delivery

Remember: A measure can be reliable without being valid (consistently wrong), but cannot be valid without being reliable.

Types of Measurement

Development researchers can choose from various measurement approaches:

NoteMeasurement Options
  • Surveys: Household/individual questionnaires, including anthropometric measures
  • Administrative data: Government records, school enrollment, health facility data
  • Logs and diaries: Self-reported tracking over time
  • Qualitative methods: Focus groups, Rapid/Participatory Rural Appraisal (R/PRA)
  • Behavioral measures: Games and choice problems
  • Observation: Classroom snapshot surveys, direct observation protocols
  • Objective assessments: Health and education tests
  • Digital data: Cell phone applications and social network data

The Psychology Behind Responses

Understanding how respondents process survey questions is crucial for designing effective instruments. Rather than simply retrieving pre-formed answers, respondents engage in a complex cognitive process that can introduce measurement error.3

The Four Stages of Response

Understanding how respondents process questions reveals why seemingly straightforward questions can produce unreliable data. Each survey response involves a complex cognitive sequence:

  1. Comprehension
    • Understanding the question’s intent
    • Interpreting key terms and concepts
    • Determining what information is being requested
  2. Retrieval
    • Accessing relevant memories
    • Searching for specific information or experiences
    • Recalling details from the defined time period
  3. Judgment and Estimation
    • Evaluating the completeness and accuracy of retrieved information
    • Making estimations when exact recall is impossible
    • Organizing thoughts into a coherent answer
  4. Response
    • Selecting an appropriate answer from available options
    • Formatting the response according to question structure
    • Potentially editing answers based on social considerations

Consider the complexity behind this apparently simple question: “How many times did you consume rice this month?” The respondent must:

  • Comprehension: Define what counts as “consuming rice” (meals vs snacks vs ingredients?)
  • Retrieval: Recall numerous eating occasions over 30 days
  • Judgment: Estimate frequency when exact counting is impossible
  • Response: Select the most appropriate category from provided options

Each stage presents distinct opportunities for measurement error:

  • Comprehension errors: Ambiguous wording leads to different interpretations across respondents
  • Retrieval limitations: Respondents may:
    • Struggle to recall specific information over long periods
    • Retrieve information selectively (more memorable events)
    • Experience systematic recall patterns (recency effects)
  • Judgment biases: Respondents may apply different estimation strategies or criteria
  • Response editing: Respondents may modify answers based on:
    • Social desirability concerns
    • Perceived consequences of their responses
    • Available response options (anchoring effects)

Understanding this cognitive process helps explain why seemingly straightforward questions can produce unreliable data. For example, when we ask “How much did your household spend on food last month,” we’re asking respondents to:

  1. Define what counts as “household”
  2. Recall numerous transactions
  3. Categorize various expenses
  4. Aggregate multiple amounts

The quality of response depends on how successfully respondents navigate each stage of this complex cognitive process.

Measurement Error

Measurement error—the difference between a respondent’s answer and the true value—can arise from multiple sources. Rather than viewing these as isolated problems, it’s helpful to understand them as systematic patterns that can be anticipated and addressed through careful design.

Problematic: “Do you exercise regularly?”

What counts as “exercise”? What does “regularly” mean?

Improved: “During the past week, how many days did you do at least 30 minutes of physical activity?” Practice: Rewrite this ambiguous question: “How often do you eat healthy food?”

Problem: “Do you disagree that the program was unhelpful?” requires respondents to work through multiple negations.

Problematic: “How satisfied are you with the quality and price of education?” conflates two potentially different judgments. Fixed: Ask separately:

“How satisfied are you with the quality of education?” “How satisfied are you with the price of education?”

Problematic: “How much have you saved for your children’s education?” assumes the respondent has children, plans to have children, and believes in saving for education. Better Approach:

“Do you have children or plan to have children?” If yes: “Do you save money for education expenses?” If yes: “Approximately how much have you saved?”

Can confuse respondents who aren’t familiar with specialized language, leading to misinterpretation or non-response.

Response Option Design

The way answer choices are structured can systematically bias responses. Poor design of response options can lead to measurement error through issues like missing categories, overlapping ranges, or unbalanced scales. Well-designed response options should be mutually exclusive, collectively exhaustive, and presented in a clear, unbiased format that allows respondents to accurately report their true answers.

Core Principles for Response Options:

  1. Mutually exclusive: No overlap between categories
  2. Collectively exhaustive: Cover all possible answers
  3. Balanced: Equal number of positive and negative options for scales
  4. Appropriate precision: Match the level of detail respondents can reasonably provide

Response options must cover all possible answers. Missing categories force respondents to choose inappropriate options.

Example Problem: “How many years have you worked full-time at IPA?” This excludes part-time workers.

Better Design:

  • Include “part-time” option
  • Add “Not applicable” for contractors
  • Include “Don’t know” option

Response options should be mutually exclusive with no overlap between categories.

Example Problem: Age categories:

  • 0-1 years
  • 1-2 years
  • 2+ years

Where does someone exactly 1-year-old fit?

Better Design:

  • 0 years
  • 1 year
  • 2+ years

Or:

  • 0-11 months
  • 12-23 months
  • 24+ months

The structure and range of response options can suggest what’s “normal” and bias answers.

Example Problem: Income categories:

  • $50,000-$75,000
  • $75,000-$100,000
  • $100,000-$200,000

This suggests these are typical income ranges and may bias responses.

Better Design:

  • Use local income distribution data to set appropriate ranges
  • Consider open-ended responses for continuous variables

The sequence of questions and overall survey context can influence how people respond.

Example Problem: Asking about satisfaction with specific services before overall satisfaction biases the overall rating.

Better Design:

  • Ask general questions before specific ones
  • Group related questions together
  • Consider randomizing question order when appropriate

Designing Quality Instruments

The SMURF Criteria

Every survey question should meet these essential standards:

  • Specific: Asks one question at a time, avoiding double-barreled questions
  • Measurable: Quantifies accurate and unbiased information with appropriate precision
  • Understandable: Easy to comprehend with clearly defined terms and concepts
  • Relevant: Measures a key or intermediate outcome connected to your Theory of Change
  • Framed: Has clear boundaries including time frame, context, and scope

Example of SMURF in practice:

Poor: “How satisfied are you with the quality and price of education?”

  • Violates Specific (double-barreled: quality AND price)
  • Not Framed (which education? what time period?)

Better: “How satisfied are you with the quality of education your child received this school year?”

  • Specific: focuses only on quality
  • Framed: specifies child’s education and time period

Question Quality Checklist

Before finalizing any question, go through this checklist to identify potential issues:

  • Does the question have an explicit rationale and a plan for how responses will be used?
  • Does it pass tests of validity, reliability, and responsiveness?
  • Can you connect this question to a specific analysis or research hypothesis?
  • Can respondents easily answer from memory?
  • Is the question simple, specific, and well-defined enough that all respondents will interpret it similarly?
  • Are you asking about something respondents actually know or care about?
  • Does the question contain words or phrases that could bias responses?
  • Does it appear to “give away” what you’d like the response to be?
  • Are you leading respondents toward particular answers?
  • Does the question focus on a single topic, or should it be broken into multiple questions?
  • Are all response options mutually exclusive?
  • Should you allow multiple answers or include an “other” option?
  • Are assumptions implied by the question actually warranted?

Survey Implementation Guidelines

Technical Considerations

  • Skip logic: Double-check all conditional questions and use consistent visual cues
  • Hard/soft checks: Limit validation checks primarily to computer-assisted interviews
  • Visual design: Use consistent formatting and clear instructions for surveyors
  • Supporting materials: Develop answer cards and visual aids as needed

Human Elements

  • Rapport building: Standardize the first minute of every interview to establish trust
  • Documentation: Leave space for comments and field notes
  • Closure: Always thank respondents and provide contact information
  • Flexibility: Allow time for respondents’ questions and concerns

From Theory to Practice: Design Exercise

Let’s apply measurement principles to a real scenario demonstrating how theory translates into practice.

Scenario: Evaluating a Microfinance Program

You’re evaluating a microfinance program that provides small loans to women entrepreneurs.

NoteTheory of Change

Microfinance → Business investment → Increased income → Women’s empowerment

Design Challenge

Create survey questions to measure:

  1. Business investment (intermediate outcome)
  2. Women’s empowerment (final outcome)

For each question, consider:

  • Is it valid for your construct?
  • Is it reliable?
  • What measurement errors might occur?
  • How can you minimize these errors?

Sample Solution

TipBusiness Investment Measurement

Poor Design: “Did the loan help your business?”

Better Design: “In the past 3 months, how much money did you invest in your business from the following sources:

  • Personal savings: $____
  • Microfinance loan: $____
  • Other loans: $____
  • Family/friends: $____”
TipWomen’s Empowerment Measurement

Poor Design: “Do you feel empowered?”

Better Design: “Who in your household makes decisions about:

  • Daily food purchases: [Me alone/Spouse alone/Together/Other]
  • Children’s education: [Me alone/Spouse alone/Together/Other]
  • Large purchases: [Me alone/Spouse alone/Together/Other]”

This approach demonstrates how good measurement design moves from abstract constructs to concrete, measurable indicators while minimizing common sources of error.

IPA’s Research Methods Initiative has been at the forefront of developing innovative measurement approaches.5 The Global Poverty Research Lab at Northwestern University, in partnership with IPA, has systematically studied how to improve measurement methods across multiple domains.6

NoteIPA’s Multi-Dimensional Measurement Approach
  1. Decision-making power
    • Household purchases
    • Children’s education
    • Healthcare decisions
  2. Freedom of movement
    • Market access
    • Health facility access
    • Family visits
  3. Economic resources
    • Earnings control
    • Credit access
    • Asset ownership

Innovation and Context in Modern Measurement

Different research contexts and evolving technologies have transformed how we approach measurement in development research. Understanding this evolution helps researchers make informed choices about measurement strategies.

The Technological Revolution

Recent innovations have fundamentally enhanced our measurement capabilities:

  • Digital Data Collection: Mobile surveys, satellite data, and remote sensing provide new measurement opportunities
  • Standardized Modules: Validated question banks and measurement toolkits improve comparability across studies
  • Administrative Integration: Combining survey data with existing records reduces respondent burden while improving accuracy

For example, Ambler et al.’s (2021) study on remittances and education demonstrates how modern measurement approaches can improve data quality. Their work revealed that combining administrative records with carefully framed survey questions provided more accurate insights into how financial transfers affect educational outcomes.7

Balancing Multiple Stakeholder Needs

Successful measurement strategies must balance competing demands:

  • Researchers need reliable, valid data that supports rigorous analysis
  • Policymakers require clear, actionable insights that inform decisions
  • Administrative systems must maintain practical compatibility and feasibility
  • Study participants deserve respect for their time and privacy

This evolution reflects a broader understanding that measurement is not just a technical challenge but a social and political one that requires careful consideration of context, culture, and consequences.

This comprehensive approach provides more valid measurement than single-item indicators, influencing how organizations measure women’s empowerment globally. :::

Understanding Measurement as Social Practice

Measurement in development research extends beyond technical considerations to encompass social, cultural, and ethical dimensions. Effective measurement recognizes that:

  • Cultural context shapes interpretation: What constitutes “household income” or “food security” varies across cultures
  • Power dynamics influence responses: Respondents may alter answers based on perceived consequences
  • Measurement itself can be an intervention: The act of asking questions can change behavior or awareness

Piloting and Testing

ImportantEssential Testing Steps
  1. Review before piloting: Check instruments without collecting data
  2. Mandatory piloting: No instrument should be used without testing
  3. Iterative process: Pilot, revise, pilot again
  4. External testing: Pilot outside your target population first
  5. Staff testing: Ensure your team can implement the instrument effectively

Common Mistakes to Avoid

Based on extensive research experience, researchers frequently make these errors:

  • Including unnecessary concepts: Only measure what directly relates to your Theory of Change and analysis plan
  • Overly long surveys: Most surveys are simply too long - ruthlessly prioritize
  • Functional concept definitions: Defining concepts by their expected relationships rather than their intrinsic properties
  • Ignoring cultural context: Failing to adapt questions to local understanding and norms
  • Skipping pilot testing: Moving directly to full implementation without adequate testing

References

  • Bradburn, N. M., Sudman, S., & Wansink, B. (2004). Asking questions: The definitive guide to questionnaire design. Jossey-Bass.
  • Deaton, A., & Zaidi, S. (2002). Guidelines for constructing consumption aggregates for welfare analysis. World Bank Publications.
  • Fowler, F. J. (1995). Improving survey questions: Design and evaluation. Sage.
  • Glewwe, P., & Grosh, M. E. (2000). Designing household survey questionnaires for developing countries. World Bank.
  • Tourangeau, R., Rips, L. J., & Rasinski, K. A. (2000). The psychology of survey response. Cambridge University Press.
Back to top

Footnotes

  1. Jensen, R. (2010). The (perceived) returns to education and the demand for schooling. The Quarterly Journal of Economics, 125(2), 515-548.↩︎

  2. Ambler, K., Aycinena, D., and Yang, D. (2021). Channeling remittances to education: A field experiment among migrants from El Salvador. American Economic Journal: Applied Economics, 13(2), 207-235.↩︎

  3. Glennerster, R., & Takavarasha, K. (2013). Running randomized evaluations: A practical guide. Princeton University Press.↩︎

  4. Global Poverty Research Lab at Northwestern University. (n.d.). Methodological Studies in Development Research. https://www.poverty-action.org/researchers/working-with-ipa/research-methods-initiative↩︎

  5. Tourangeau, R. (1984). Cognitive sciences and survey methods. In T. Jabine, M. Straf, J. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey methodology: Building a bridge between disciplines (pp. 73-100). The National Academies Press.↩︎

  6. IPA Research Methods Initiative. (2023). Research Methods Initiative Overview. https://www.poverty-action.org/researchers/working-with-ipa/research-methods-initiative↩︎

  7. Global Poverty Research Lab at Northwestern University. (n.d.). Methodological Studies in Development Research. https://www.poverty-action.org/researchers/working-with-ipa/research-methods-initiative↩︎

Reuse