Step-by-step instructions for implementing stratified randomization in Stata, including reproducible seed setting, balance checking, and troubleshooting common issues.
Learning Objectives
Randomization is a fundamental technique in impact evaluation that ensures treatment and control groups are comparable, allowing for unbiased estimates of program effects. This guide will help you:
Understand the principles of randomization and its importance in impact evaluation
Implement stratified randomization in Stata to maintain balance across key characteristics
Set up reproducible randomization with proper seed management
Verify treatment assignment and check for balance across groups
Problem: Assign Participants to Treatment Groups
You need to randomly assign participants to treatment and control groups while maintaining balance across key characteristics like grade level, school, or other strata.
Prerequisites:
Stata installed with randtreat package
Dataset with participant identifiers and stratification variables
Basic familiarity with Stata commands
What you’ll accomplish:
Set up reproducible randomization with proper seed management
Implement stratified randomization to achieve balance
Verify treatment assignment worked correctly
Step 1: Set Up Your Environment
Set Stata version and seed for reproducibility:
Use a different seed for each project
Choose seeds through random methods—such as from random.org or dice rolls
Set only one seed per do-file
Always set the Stata version at the beginning since algorithms change between versions
Step 2: Prepare Your Data
Example scenario: School supply program evaluation
500 students across 10 schools
Budget to treat 250 students
Need to ensure balance across grades
Create your dataset with participant identifiers and strata:
%%stataclearset obs 500gen student_id = _ngen school_id = ceil(_n/50) //10 schools, 50 students eachgen grade = mod(_n-1,5) +1// Grades one to five, evenly distributed
. clear
. set obs 500
Number of observations (_N) was 0, now 500.
. gen student_id = _n
. gen school_id = ceil(_n/50) // 10 schools, 50 students each
. gen grade = mod(_n-1,5) + 1 // Grades one to five, evenly distributed
.
Step 3: Set Random Seed
Set a random seed for replicability:
%%stataset seed 12345
Step 4: Verify Your Strata
Check that your stratification variable covers all participants:
Assign half of students to treatment within each grade—total treated equals 250. Generate a treatment variable where 1 equals treated and 0 equals control:
%%stata* Install randtreat ifnot already installedcap ssc install randtreat* Use randtreat for stratified randomization by graderandtreat, strata(grade) unequal(1/21/2) generate(treatment)
.
. * Install randtreat if not already installed
. cap ssc install randtreat
.
. * Use randtreat for stratified randomization by grade
. randtreat, strata(grade) unequal(1/2 1/2) generate(treatment)
assignment produces 0 misfits
.
Step 6: Verify Balance
Tabulate treatment by grade and school to verify balance:
Not setting a random seed: Always use set seed in Stata to ensure your results are replicable.
Failing to check balance: After randomization, verify that treatment and control groups have balance on key variables.
Confusing unit identifiers: Double-check IDs—such as village, school, or participant names—to avoid misassignment.
Contamination: Monitor to prevent control group members from receiving the treatment.
Poor documentation: Keep detailed records of your randomization procedure for transparency and reproducibility.
Sorting Considerations
Ensuring Reproducible Sorting
The sort command can produce non-reproducible results if the sorting variables don’t uniquely identify observations. Always include a unique ID as the last sorting variable:
* Check if ID is uniqueisid unique_id* Sort byunique ID before generating random numberssort unique_idgen rand = runiform()* When sorting bystrata and random number, includeunique ID lastsort region rand unique_id
This prevents Stata from breaking ties inconsistently, ensuring your randomization is reproducible.