Administrative Data
A guide to obtaining, using, and managing administrative data for research, covering access processes, ethical considerations, and common challenges.
- Administrative data refers to information collected, used, and stored primarily for operational purposes rather than research.
- While IPA projects usually rely on primary data, administrative data can also be a valuable input resource for research and impact evaluation.
What is Administrative Data?
Administrative data is typically collected by government agencies and organizations for registration, transaction, and record-keeping purposes. Examples of administrative data can include credit card transactions, electronic medical records, insurance claims, educational records, arrest records, and mortality records.
Real-World Example: IPA’s Experience with Administrative Data
IPA’s research on using administrative data for Monitoring and Evaluation (2016) highlights several key advantages:
- Cost-Effectiveness: Administrative data reduces or eliminates the need for additional monitoring activities or surveys.
- Timely Response: Regular updates in management information systems enable faster analysis of key indicators.
- Large Sample Size: Coverage of entire beneficiary populations provides robust sample sizes.
- Improved Accuracy: Reduces social desirability bias and recall issues common in self-reported data.
This research emphasizes that data accuracy and reliability should take precedence over cost savings. Organizations must balance data quality, actionability, and resource allocation when incorporating administrative data into their research design.
Standard Processes for Accessing Administrative Data
Accessing administrative data involves four main steps:
Implementing partners and government agencies are valuable sources of administrative data. Common sources include:
- Health Data: Regional/national health departments, hospitals, health insurance records
- Financial Data: Banks, credit unions, credit reporting agencies
- Education Data: Schools, ministries of education, standardized testing agencies
When requesting administrative data, researchers should:
- Define the time frame, format, and structure needed: “Primary school records from January 2022 to December 2023 in CSV format”
- List specific variables of interest: Student ID, school level, Teacher ID, attendance, test scores
- Specify whether you need identified or de-identified data: Request de-identified data when possible to reduce ethical complexity
- Avoid broad requests: Instead of “all student data,” specify exact variables, time frames needed, and frequency of updates
A well-planned data flow strategy ensures smooth integration of administrative data:
- Gather Identifying Information: Determine what identifiers are available in the study sample such as national ID numbers, phone numbers, email addresses
- Link Datasets: Use pre-existing identifiers to match study data with administrative data. For example, match student IDs with test records using national ID numbers
- Choose a Matching Strategy:
- Exact matching: When identifiers are identical such as national ID numbers
- Probabilistic matching: When using combinations of name, date of birth, and location information
- Determine Who Performs the Link: Clarify whether the data provider, researcher, or a third party will conduct the linkage and deindentification to maintain data security and privacy
Ethical Considerations for Using Administrative Data in RCTs
Using administrative data in research requires adherence to ethical and legal standards. Key considerations include:
Institutional Review Board (IRB) Approval
Most research using administrative data qualifies as human subjects research and requires IRB approval. This includes ensuring:
- Proper handling of personally identifiable information (PII)
- Justification for using identified vs. de-identified data
- Security measures for protecting sensitive information
Informed Consent
In some cases, researchers may need to obtain informed consent from participants before accessing administrative data. This depends on:
- The type of data you access
- Whether it is possible to de-identify the data
- Legal requirements set by the data provider
Data Security and Compliance
Researchers must implement robust security measures, including:
- Encryption for storing and transferring sensitive data
- Access controls to limit data exposure
- Compliance with legal regulations such as the General Data Protection Regulation (GDPR) or Health Insurance Portability and Accountability Act (HIPAA), if applicable
For further clarification on IRB-related issues, any project with concerns should email humansubjects@poverty-action.org.
Challenges When Using Administrative Data
While administrative data offers many advantages, researchers often face challenges such as:
Differential Coverage
Treatment and control groups may appear differently in administrative records, leading to bias. Examples include:
- Identifiers Obtained After Enrollment: In a financial literacy program evaluation, treatment group participants may be more willing to provide bank account numbers for linking with administrative data, creating selection bias.
- Program-Generated Data: If the intervention encourages healthcare visits, treatment group participants will appear more frequently in health administrative records, inflating apparent impact.
Reporting Bias
Some administrative data relies on self-reporting, which can introduce inaccuracies. Examples include:
- Incentives for Misreporting: Agencies or individuals may have reasons to over- or under-report data.
- Human Error in Data Entry: Manual data entry can introduce inconsistencies.
Cost of Administrative Data
Administrative datasets vary in cost, depending on:
- The number of records requested
- File-years needed
- Data preparation time required by the provider
Conclusion
Administrative data is a powerful tool for randomized evaluations, offering cost-effective, accurate, and comprehensive insights. However, researchers must navigate ethical, legal, and logistical challenges to ensure data quality and validity. By following standardized processes, addressing ethical concerns, and mitigating common challenges, researchers can leverage administrative data for impactful research.