See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/315843209 Methods of Impact Evaluation: A Review Article in SSRN Electronic Journal · April 2017 DOI: 10.2139/ssrn.2943601 CITATIONS READS 2 1,129 2 authors: Abhisek Mishra Tushar Kanti Das Sambalpur University Sambalpur University 12 PUBLICATIONS 12 CITATIONS 17 PUBLICATIONS 17 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Sustainable Livelihood of Sambalpuri Handloom Weavers: A Study in Western Odisha View project Livelihood Security View project All content following this page was uploaded by Tushar Kanti Das on 29 May 2018. The user has requested enhancement of the downloaded file. Methods of Impact Evaluation: A Review Abhisek Mishra Research Scholar (Ph.D.) Department of Business Administration Sambalpur University Email: rikan.mishra@gmail.com Tushar Kanti Das Reader, Department of Business Administration Sambalpur University Email: tkd@live.in Electronic copy available at: https://ssrn.com/abstract=2943601 Abstract: Government along with non-government organisations initiates various development programmes for the different strata of the society. The extent of those programmes is known after the evaluation only. Impact evaluation provides the causal effect of a programme on the outcome. Measurement of impact of any developmental programme is a difficult task. This paper provides a systematic review on various quantitative methods available for impact evaluation and suggests the use of combination of methods for enhancing the robustness of the impact. Key words: Impact evaluation, methods, randomisation, regression discontinuity, propensity score matching. Electronic copy available at: https://ssrn.com/abstract=2943601 1. Introduction Economic development is a policy intervention that aims at economic and social well-being of people. Developmental programmes are initiated by the Government along with the NonGovernment Organisations (NGOs) to bring a change in the outcome (for example, rise in income, improving well-being, improving livelihood, improving learning) for different strata of society both at national and international levels. The extent of benefits of those programmes is known after the evaluation only. Evaluation is the process of systematic and objective assessment of a programme using a set of governed standards. In simple words, evaluation answers questions relating to design, implementation and benefits of a programme. Evaluation addresses three types of questions; (a) descriptive question that explains what is taking place (the detailed process and condition), (b)Normative question, makes a comparison between what has taken place and what should be taking place and lastly (c) cause-and-effect question that examines the differences in the outcomes after intervention (Imas & Rist, 2009). Impact evaluation answers the third question. The impact evaluation is structured with a particular kind of question that is, what is the impact (causal effect) of the programme on the outcome (Gertler, Martinez, Premand, Rawlings, & Vermeersch, 2007). The evaluation of a policy is based on two theories; structural approach and treatment effect approach. The structural approach is applicable where there is a universal participation; whereas, treatment effect approach is applicable where there are two groups,(a) treatment group, who take part in the programme and (b) comparison group, who do not participate in the programme (Heckman, James, & Vytlacil, 2005). In evaluating any intervention data are required (quantitative or qualitative). Quantitative data are suitable for measuring the levels and changes in the impacts and by the help of which an inference is drawn. But, it is less effective in understanding the process; i.e., the mechanism that how an intervention activates a series of events that are reflected in the impact of the interventions. On the other hand, qualitative methods are more effective in understanding the process of impact. Most of the econometric analysis (using quantitative data) fails to examine the detailed process of project implementation (not using qualitative data). As a result of which, it becomes difficult to know the reason behind the failure of the project. That is, whether the failure lies in design or in implementation. In other words, in all those cases, the research questions are being shaped by the data instead of their data by the questions. To avoid these cases qualitative data (tape recordings of village meetings, free-ranging open ended interviews, focus groups) are to be used (Bamberger, Rao, & Woolcock, 2010). Due to the importance of both the methods, the concepts like multi-method and mixed-method come into existence. Creswell and Clark (2007) give a clear distinction between the multi-method (i.e. qualitative or quantitative method) and mixed method research (i.e. integration of quantitative and qualitative method). Greene, Caracelli, and Gruham (1989) also define mixed methods as the method that includes at least one quantitative method (to collect numbers) and one qualitative method (to collect words). When in a single study a combination of quantitative and qualitative approaches is used as research methodology, it refers to the mixed method study (Tashakkori & Teddlie, 1998). Apart from methods and approaches, combination of quantitative and qualitative techniques, concepts or languages that are used in a single study is called mixed method research (Johnson & Onwuegbuzie, 2004). Both quantitative and qualitative data have their relative importance. Therefore, researcher should not give primacy to one over another; as different methods are used to tackle different problems. The combination of techniques provides greater insight instead of using either of one in isolation (White, 2002). In that context, Bamberger, Rao, and Woolcock (2010) provide a view that mixed methods (using both quantitative and qualitative) significantly strengthens the validity and operational utility of designs. Rao and Woolcock (2003) argue for the use of a judicious mix of both quantitative and qualitative method is to be used to get a comprehensive evaluation of an intervention. They add that Parallel integration of quantitative and qualitative method is suitable for large government projects like national level poverty assessment. Therefore, a combination of both quantitative and qualitative method is to be used in the evaluation of an intervention. A description of how and why a programme/ policy is supposed to deliver the desired results is known as a theory of change ; where as a sequence of inputs, activities and outputs to improve the outcome constitute a result chain. To establish causality between a programme and an outcome, impact evaluation methods are used. The causal effect or impact of a programme ‘p’ on an outcome of interest ‘y’ is given by: α = (Y│p=1) – ( Y│p=0); where ‘α’ represents the impact of a programme on an outcome, ( Y│p=1) is the outcome with the programme and ( Y│p=0) is the outcome without the programme. The estimated impact applied to those units to whom the programme has been offered, regardless of their enrolment in the program, is called intention-to-treat (ITT). But, the estimated impact is called as Treatment-on- Treated (TOT), when the units to whom the programme have been offered and have actually enrolled in that. That means, both the ITT and TOT becomes same when all units to whom a programme has been offered actually decided to enrol in that (Gertler et al., 2007). 2. Methods of Impact Evaluation Evaluation methods in empirical economics fall into five broad categories; each provides an alternative approach for constructing the counterfactual and to minimise the selection bias. Alternative evaluation methods depend on several criteria like; (a) nature of the program (i.e. whether the program/policy is local or national, small scale or global) (b) nature of questions to be answered, and (c) nature of data available (Blundell & Dias, 2000). Heckman, Smith, and Clements (1997) and Heckman, Ichimura, and Todd (1998a, b) show that data quality is also a crucial ingredient for the determination of the appropriate estimation strategy. A review on various methods of impact evaluation is as follows. 2.1. Randomisation method Randomisation is defined as the incorporation of a fully controlled bit of randomness in the process of data generation. The advantage of the physical act of randomisation is mentioned by Fisher (1935), while discussing Galton’s analysis of a Darwin experiment with 15 pairs of self-fertilised and cross fertilised seeds. This idea of Fisher was immediately generalised by Pitman (1937) and then pushed to the natural boundary by Kempthrone (1952) and many others (Basu, 1980). Randomised assignment is often used both in large and small scale impact evaluation work. Because the programme manager ensures that every eligible person or unit has the same chance of receiving the programme. When the observation becomes very large, there is a possibility of flow of characteristic through treatment and comparison group; if they are created through randomised assignment. An evaluation is internally valid; when it uses a valid comparison group. In the context of forming comparison group and treatment group Fisher (1935) advocate the use of Combinatoric analysis (which is rarely used in practice) to calculate the exact probabilities of each possible outcome (Deaton, 2009).When the impact estimated in the evaluation sample could be generalised to the total population, that evaluation is an externally valid one. Randomised assignment is used, when there exists excess demand for a programme and when a programme needs to be phased until it covers the entire population. (Gertler et al., 2007). The use of randomisation method is found in estimating the impacts of various programmes initiated in different contexts, which are as follows: Vermeersch (2002) by using randomised technique examine the effect of providing breakfast on school participation. He finds a higher participation rate (30% higher) in twenty five treatment schools than twenty five comparison schools. The effect of providing textbooks in schools in Kenya on test scores is studied by Glewwe, Kremer, and Moulin (2002). The authors through randomisation technique find no effect of the intervention on the test scores of bottom 60% students. On the contrary, they find an increase in the test scores of the students who perform well in the pre-test exam. Banerjeee, Jacob, and Kremer (2002) evaluate a programme by monitoring the attendance of both teachers and children. The authors find a decrease in the number of closing days after the introduction of the program (i.e. 44% in one- teacher and 39% in twoteacher). They also find an increase in girls’ participation. In the evaluation of a Colombian programme for extending the coverage of secondary school (where vouchers for private schools are allocated by lottery due to the limitation of programme’s budget), Angrist, Bettinger, Bloom, King, and Kremer (2002) take the advantage of randomly assigned treatment and find that lottery winners are 15-20% more likely to attend private schools, 10% more likely to complete the 8th grade and scores on an average 0.2 standard deviation higher on standardised tests. Kremer, Moulin, and Namaunyu (2002) evaluate a programme where an NGO provides uniforms, textbooks, and classroom construction to seven schools that are randomly chosen from fourteen poorly performing schools in Kenya and find considerably low dropout rates in treatment schools. In evaluating the effect of a twice-yearly school based mass treatment programme on absenteeism rate in Kenya. In seventy five randomly selected schools de-worming drugs are provided to students. Miguel and Kremer (2003a) use this method and find a lowered absenteeism rate (25%) in the treatment schools. Glewwe, Illias, and Kremer (2003) evaluate a programme where parent school committees provide gifts to teachers whose students perform well. They conclude an increase in the test scores of the students who take part in the programme initially, but later on fall back to the level of comparison group at the end of the programme. Banerjee, Cole, Duflo, and Linden (2003) evaluate the impact of a remedial education programme introduced by Pratham, an Indian NGO, where young women are hired from the communities and are provided with remedial education to children in Government school. On an average, after two years of the program, they find an increase in the test scores of the students by standard deviation of 0.39. Moreover, the authors conclude that the bottom level children gain the largest out of this programme. They also conclude that hiring remedial education teachers from community is ten times more cost effective than hiring new teachers. Schultz (2004) studies the impact of transferring funds to poor mothers in rural Mexico on their children’s enrolment in school. The authors based on the randomised assignment find a positive effect on the enrolment after the intervention. Apart from the above, randomised method is also used in various fields for studying the impact, clear from the below mentioned cases. For the targeted wage subsidy programme in U.S., Burtless (1985), Woodbury and Spiegelman (1987) and Dubin and Rivers (1993) use randomised technique of evaluation. While assessing the impact of credit, Karlan and Zinman (2010) using randomised experiment concludes that marginal loans produce significant net benefits for borrowers over a wide range of outcome. In assessing the effect of performance based payment on the use and quality of maternal and child health services provided by health-cares in Rwanda, Basinga et al. (2011) conclude that financial performance incentives (i.e. payment for performance) improves the use and quality of maternal and child health services. Randomised promotion method is similar to that of the randomised offerings. Under this method, the units are selected randomly to whom promotion is provided for the treatment; instead of randomly selecting units to whom treatment is offered. By doing so, programme is left open for every unit. There are three types of units under randomised promotion method,(1) Always- always they want to enrol in the programme, (2) Enrol- IfPromoted- they enrol only when additional promotion is provided, (3)Never- they never enrol in the programme; whether the promotion is offered or not (Gertler et al., 2007). The use of this technique is found in Newman et al. (2002); where the authors evaluate the effect of a programme where social investment funds are provided for small scale investments in education, health and water infrastructure in Bolivia. The authors find a little impact on educational outcomes, except for a decrease of about 2.5 percent in the drop-out rates. Gertler, Martinez and Vivo (2008) use this method in the evaluation of a maternal and child health insurance programme introduced by the Argentina Government after 2001 economic crisis. They find an improvement in the health status of the population. Gertler (2008) also evaluate the impact of a maternal and child health insurance programme in Argentina by using the method. In the present day scenario, the popularity of clustered randomised controlled trial is increasing in evaluating the impact interventions which are applicable to intact groups of individuals (Abe & Gee, 2014). When clustered data are taken into consideration, evaluators select from a variety of methods that appropriately account for the correlation between the study participants. The choice of method depends upon factors like (a)professional judgement and prior training of the evaluator, (b) the disciplinary field (i.e. public health, education etc.) in which the study is conducted. Therefore, the evaluator is required to do a sensitivity analysis1 (Thabane et al., 2013). A clustered randomised controlled trial (RCT) refers to an experiment in which intact groups of individuals are randomly assigned to receive an offer to participate or not. When the randomisation occurs either in group level or in individual level, it is referred as unit of randomisation. The potential cross contamination between control and treatment conditions can be avoided when RCT is applied (Raudenbush, 1997). In order to determine the impact of the programme on the outcome in an RCT, an analyst applies standard t-test or ordinary least square (OLS) regression to test the estimate effects on treatment condition with the assumption that individuals are members of existing 1 carrying out the analysis by using different methods and checking the consistency of result across such methods. groups and may not be completely independent of each other. So, the degree of the interdependency of individuals with in a cluster is measured quantitatively by the intra-class correlation coefficient (ICC). The ICC value ranges from 0 to 1. The values closer to 1 indicate high degree of correlation and low otherwise. Abe and Gee (2014) address four methods for analysing clustered data. First one is, hierarchical linear modelling (also called as random coefficient models, mixed- level models or multi-level models. Second one is, feasible generalised least square, which accounts for modelling the correlated nature of errors and provides an efficient estimate than standard OLS (Cameron & Miller, 2011). Third one is, generalised estimating equation (GEE), which is an extension of general linearised model (Burton, Gurrin, & Sly, 1998), and the last one is ordinary least square. Various studies have heterogeneous results on these four methods. Hubbard et al. (2010) argue that GEE model describes more accurately the population parameter in comparison to HLM. Adding one more advantageous feather on GEE, Gelman (2006) note that HLM or mixed-level model allows the parameters to vary whereas GEE does not. From the literature in the medical field, Galbraith, Daniel, and Vissel (2010) by using simulated data find out how the results vary across the range of methods. The use of RCT is found in the followings: Li et al.(2017) using RCT studies the effect of conditional cash transfer(CCT) on matriculation of junior high school students into rural China’s high schools. The authors find no effect of CCT voucher on the students’ performances. In studying the impact of matching grants (a common type of private sector development programme), McKenzie, Assaf and Cusolito (2017) use RCT and find a positive impact of the programme through more product innovation, updated accounting system, increase in capital investment and growth in sales after one year. 2.2. Regression Discontinuity Design In impact evaluation, Regression discontinuity design method is used for a programme that has a continuous eligibility index with a clearly defined cut-off score to determine the eligibility of the participants. The impact estimator under RDD is E (YT │M= m-ε) – E(YC │M= m+ε); where, Mi denotes score received by unit ‘i’ in a proxy means test and m denotes the cut-off point for eligibility with the condition that, Ti=1 for Mi≤ m and Ti=0 otherwise. The various studies based on RDD are as follows. Duflo (2003) uses this method in studying the impact of expanded old-age pension programme (by regressing anthropometric outcomes on a number of covariates) by taking the eligible age requirements for men and women as instruments of receiving pension. The author finds a significant positive effect on the status of girls in raising weight and height (in the case of pension received by women). Matinez (2004) studies the effect of old age pension programme on consumption and finds a result consistent with the presence of credit constraint. In assessing the effect of social assistance programme, which is funded through the Canadian Assistance plan, in Quebec, Canada in labour market outcome, Lemieux and Milligan (2005) by using regression discontinuity design method (limiting the sample to men) find that access to greater social assistance benefits reduces employment by about 4.5 percent for men. Further, the impact of school fee reduction programme on school enrolment (in the city of Bogota, Colombia) is studied by Barrera, Linden and Urquiola (2007). They use regression discontinuity design method and find a positive impact on school enrolment rates. Similarly, regression discontinuity design method is also used to evaluate a social safety net initiative in Jamaica. In 2001, the Government of Jamaica initiates a programme namely, Programme of Advancement through Health and Education (PATH), where grants are given to children in eligible poor households on the condition of regular attendance and health visits. Levy and Ohis (2007) by using regression discontinuity design find that PATH programme increases school attendance for children ages 6 to 17 by an average of 0.5 days per month. Filmer and Schady (2009) studies the impact of scholarship in school enrolment and test scores of poor students in Colombia and find an increment in school enrolment and test scores by approximately 25 percent after providing scholarship to poor students. Other works based on this method have also provided regression discontinuity as an alternative for experimental method which is clear from the study of Buddlemeyer and Skofias (2003). The study explains that when policy discontinuities are regularly enforced, regression discontinuity method will be useful. Cook, Shadish and Wong (2006) supported the result of Buddlemeyer and Skofias (2003) by adding that the randomised and nonrandomised experiments provide the same results if regression discontinuity method is applied. Although various works has been done using RDD technique, it goes through a number of drawbacks like, (a) there will be specification error, (b) When RDD is used, large evaluation samples are required to obtain sufficient statistical power. Besides these limitations, RDD yields unbiased estimate of the impact in the neighbourhood of the eligibility cut-off. 2.3. Propensity score matching (PSM) method Propensity score matching (PSM) has become a popular approach in estimating causal treatment effects (Caliendo & Kopeninig, 2008). Propensity score matching method pairs each programme participants with a single nonparticipant, where pairing is done on the basis of the degree of similarity in the estimated probability of participating in the programme. This method aims to select comparators according to their propensity score as: P (Z) = Pr (T= 1│Z) (0< P (Z) <1), where Z is a vector of pre-exposure control variables. This method is based on the assumption that, there is no unobserved difference in the treatment and comparison group. This method is applied in all situations where there exists a group of treated individuals and a group of untreated individuals. LaLonde (1986) use this method in measuring the impact of training programme on trainee’s earnings. Dehejia and Wahba (1999 using National Supported Work (NSW) data re evaluates the impact of training programme (earlier studied by LaLonde,1986) using matching estimator. In the medical field (in pharmacoepidemilogic research), Perkins et al. (2000) use this method. In the field of human resource, the effect of union membership on wages of employees is studied by Bryson (2000). In studying the effect of migration decision on wage growth, Ham, Li and Reagan (2004) use matching method. Trujillio, Portillo and Vernon (2005) evaluate the impact of Colombia’s subsidised health programme on the utilisation of medical care. Further, Brand and Halaby (2006) analyse the effect of attending elite colleges on career outcome by using this method. Use of this method is also found in Gebrehiwot and Van der veen(2015) for accessing the impact of a food security programme. Similarly, Guerzoni and Raiteri (2015) use this method in examining the impact of various technological policies upon firm’s innovative behaviour. Further, Mohapatra and Sahoo (2016) use the same in assessing the impact of women’s participation in the microfinance programme on their empowerment. Attempts are made to test Propensity Score Matching (PSM) against randomised evaluation which has shown mixed results. Dehejia and Wahba (1999) by using NSW data conclude that matching approaches are generally more reliable than general econometric estimators as they find matching estimators are able to produce a result which replicates the result using experimental method. In the evaluation of a social drop out programme in U.S, by studying the PROGRESA data, Diaz and Handa (2006) find that PSM performs well when the same survey instrument is used for measuring outcomes for both control and treatment group. In the evaluation of a U.S. training programme, Heckman, Smith and Clements (1997) and Heckman et.al (1998) emphasise on the use of same survey instrument. Along with the above advantages, Lalonde (1986) conclude that non experimental methods are subjected to specification errors and also suggest being aware while implementing these methods. Rubin and Thomas (2000) provide the disadvantage of using propensity score matching. It indicates that the impact estimated based on full unmatched samples generally provide more biased result. They also advocate that variables with weak predictive ability for outcomes are also helpful in reducing biasness if propensity score matching method is used. By giving a contradictory view Smith and Todd (2005a) argue that PSM does not solve the selection problem which has studied by Lalonde. As a reply, Dehejia (2005) explains regarding wrongly applied specifications in Smith and Todd (2005a). Further, Agodini and Dynarski (2004) find no consistent evidence of PSM replicating experimental results. Pipeline Comparison In the pipeline comparison method, comparison group is created by taking the people who apply for a program but doesn’t receive it (Ravillion, 2008). The examples of this method are found in Chase (2002)2 and Galasso and Ravillion (2004)3. 2 Chase (2002) use it where community applies for social fund in Armenia. 3 Galasso and Ravillion (2004) use this method while evaluating a large social protection program in Argentina. Although matching design is applied to any programme, the limitation of this method is that it requires extensive data sets on large samples; also, this method is practically used only when all other methods (i.e. Randomised selection, Regression discontinuity design and difference- in- differences method) are not possible to use. 2.4. Differance-in-Differances (DD) Design The difference-in-differences (DD) design takes into account any difference between the treatment and the comparison groups over a constant period of time. Though it takes the differences in the two (i.e. treatment and comparison), it does not help in eliminating the differences between treatment and comparison that change over time. Thus, when DD method is used, researchers must assume that in the absence of the programme the outcome in the treatment group moves in cyclical way with the outcome of the comparison group. The use of double difference method is seen in Binswanger, Khanderker and Rosenzweg (1993), where they estimate the impact of rural infrastructure on agricultural productivity in India by using district level data. Pitt and Khandker (1998) use DD design to estimate the impact of gender-wise participation in the Grameen Bank and other two micro credit programmes on labour supply, schooling, household expenditure and assets in Bangladesh. Impact of building schools on schooling and earnings in Indonesia by Duflo (2001) is another example of double difference design. Galiani, Gertler and Schargrodsky (2005) use a double difference design to study the impact of privatising water services on child mortality in Argentina and suggest that privatisation of water services reduces child mortality. The classic design for a double difference estimator tracks the difference overtime between participants and nonparticipants. Ditella and Schargrodsky (2005) by using DD design examine an increase in police forces on the reduction of crime. Chaudhury and Parajuli (2008) use this method in examining the impact of female school stipend programme on school enrolment in the Punjab province of Pakistan taking panel data (from 2003) and follow up data (in 2005). Das (2016) uses this method in examining the impact of Mahatma Gandhi National Rural Employment Guarantee Agency (MGNREGA, an employment generation programme of GOI ensuring 100 days of work in a year) on livelihood security using NSS data. 2.5. Instrumental Variable (IV) Method Instrumental Variable (IV) method is one of the signature techniques in econometric tool kit (Angrist & Krueger, 2001). This method is used when it is not possible to create a comparison group randomly. It consists of using one or more variables (instruments). First of all, it is used to predict the programme participation; then programme impact is estimated basing upon the prediction. Geographic placement of programme, political variables and discontinuities created by the programme design are three popular sources of instrumental variables in evaluating anti-poverty programme in developing countries (Ravallion, 2008). Examples of IV method is found in Angrist and Lavy (1999), where the authors by taking class size of 40 as an instrumental variable (Maimonides’rule) shows a significant and substantial increase in the test scores of grade four and grade five students due to reduce in the class size. Basley and Case (2000) studies the impact of compensation on wages and employment by taking the presence of women in the state parliaments as the IV for workers’ compensation insurance. Similarly, Paxson and Schandy (2002) by using IV method studies the impact of Peruvian Social Fund (FONCODES) and lays down a conclusion that the investment bears a positive effect on school attendance rates for young children. Following the Case and Deaton (1998), Duflo (2003) by taking eligibility as the instrumental variable factor find that pension received by women has a greater impact on the anthropometric status of girls and find no effect on boys. On the other hand, no effect of the programme is found when pension is received by men. Another example of this approach is found in the study of the impacts of nutrition programme in rural Colombia, where food and child care facilities aere provided through local community centres (Attanasio and VeraHernandez 2004). Other examples of IV method are found in Duflo and Pande (2007)4. Conclusion From the above review based on various methods of impact evaluation, it is articulated that most of the studies are based on experimental method (i.e. randomisation method); as studies based on this method provides most robust estimate. Researchers have advocated this method as ideal for estimating the main impact of a programme (Ravillion, 2008). At the same time, studies of Lalonde (1986) and Dehejia and Wahba (1998, 1999) suggest that non-experimental methods are more reliable and also provide the same result as experimental method. But, omitted variables (Glazerman et al., 2002) and publication bias (Delong & Lang, 1992) are the major problems when non-experimental methods are used. Thus, none of these evaluation tools is ideal in all circumstances. Therefore, there is the need of combination of methods to increase the robustness of the estimated counterfactual and offset the limitations of single methods. The best example of combination of method is found in Cattaneo, Galiani, Gertler, Martinez and Titiunik (2009), where they use both difference-in-differences and matching methods to study the impact of PiscoFirme on child cognitive development. By implementing matched difference-in-differences method the researcher offsets the risk account for unobserved characteristics (that happens when only 4 Duflo and Pande (2007) studied the impact of dam construction on poverty. propensity score matching is used) that explains the reason of getting enrolled in a programme, which affects the outcomes also. Therefore, when the researcher combines propensity score matching with double differences method, first of all, needs to perform “matching” based on the observable characteristics and then diference-in- differences method is used to estimate a counterfactual for change in the outcomes in each subgroup matched units. Finally, an average is made across the matched subgroups from the double differences (Gertler et al., 2007). References Abe,Y., & Gee, K.A. (2014). Sensitivity analysis for clustered data: An illustration from a large-scale clustered randomized controlled trial in education, Evaluation and program planning, 47, 26-34. Agodini, R, & Dynarski, M. (2004). Are experiments the only option? A look at dropout prevention programs, Review of Economics &Statistics, 86(1), 180-194. Angrist, J., & Lavy,V. (1999). Using Maimonides’rule to estimate the effect of class size on scholastic achievement, Quarterly Jourmal of Economics, 114 (2), 533-575. Angrist, J., & Krueger, A.B. (2001). Instrumental variables and the search for identification: From Supply and Demand to Natural Experiments, NBER, Working Paper No. 8456. Angrist, J, Bettinger, E., Bloom, E., King, E., & Kremer,M., (2002). Vouchers for private schooling in Colombia: Evidence from a Randomised Natural Experiment, Americian Economic Review, 92(5), 1535-38. Attanasio, O., & Vera-Hernandez, A.M. (2004). Medium and long run effects of nutrition and child care: Evaluation of a community nursery programme in rural Colombia, Working paper EWP04/06, Centre for the Evaluation of Development Policies, Institute of Fiscal Studies, London. Bamberger.M., Rao.V., & Woolcock, M. (2010). Using mixed methods in monitoring & evaluation experiences from international development, Policy research working paper, The world Bank. . Banerjee, A., Cole, S., Duflo, E., & Linden, L. (2007). Remedying education: Evidence from two randomized experiments in India, Quarterly Journal of Ecnomics, 122(3), 1235-1264. Barrera, O., Linden, L., & Urquiola, M. (2007). The effects of user fee reductions on Enrollment: Evidence from a Quasi-Experiment, Colombia University and World Bank, Washington DC. Basu,D. (1980). Randomization analysis of experimental Data: The Fisher randomization test, Journal of Americian Statistical Association, 75, 305-325. Basinga, P., Gertler, P.J. , Binagwaho, A., Soueat,A.L., Sturdy,J., & Vermersch,C.M (2011). Effect on Maternal and Child health Services in Rwanda of Payment to primary health-care providers for performance: an impact evaluation, The Lancet, 377(9775), 1421-8. Besley, T., & Case, A. (2000).Unnatural experiments? Estimating the incidence of endogenous policies, Economic Journal ,110, 672-694. Binswanger, H., Khandker, S.R., & Rosenzberg, M. (1993). How infrastructure and financial institutions affect agricultural output and investment in India, Journal of Development Economics, 41, 337-366. Blundell, R., & Dias, C.M. (2002). “Evaluation methods for Non- Experimental Data”, Institute for Fiscal studies, 21 (4). Brand, J.E., & Halaby, C.N. (2006). Regression and matching estimates of the effects f elite college attendance on educational and career achievement, Journal of Royal Statistical Society, Series A, 168(3), 473-512. Buddlemeyer, H., & Skofias, E. (2003). An Evaluation on the Performance of RegressionDiscontinuity Design on PROGRESA, Institute for Study of Labor, Discussion Paper No.827. Burtless, G.(1985). Are targeted wage subsidies harmful? Evidence from a wage voucher experiment. Economic perspectives, 9(2), 63-84. Burton, P., Gurrin, L., & Sly, P. (1998). Tutorial in biostatistics, extending the simple linear regression model to account for correlated responses: An introduction to the Generalised estimating equations and multi level modelling, Statistics in Medicine, 17, 1261-1291. Cameron, A.C., & Miller, D.L.(2011). Robust inference with clustered data, Handbook of Empirical Economics and Finance. Case, A., & Deaton, A. (1998). Large cash transfers to the elderly in South Africa, Economic Journal, 108, 1330–1361. Chaudhury, N., & Parajuli, D. (2008). Conditional cash transfer and female schooling: the impact of female school stipend program on public school enrolments in Punjab, Pakistan, World Bank Policy Research, Working paper No. 4102. Cook, T.D., Shadish, W.R., & Wang, V.C.(2006). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons, Journal of policy Analysis and Management, 27(4), 724750. Chase, R. (2002). Supporting communities in transition: The impact of the American Social Investment fund, World Bank Economic Review, 16(2), 219-240. Creswell, J., & Clark, P.V. (2007). Designing and conducting mixed method research, sage, Thousand oaks. Deaton, A.S. Instruments for development : Randomization in the tropics, and the search for the elusive keys to economic development, NBER working paper 14690, available at http://www.nber.org/papers/w14690 (accessed on 9th Dec. 2016) Dehejia, R., & Wadba, S. (1999). Causal effects in non-experimental studies: Re-evaluating the evaluation of training programs, Journal of the American Statistical Association, 94, 1053-1062. Dehejia, R. H., & Wahba, S. (2002). Propensity score matching methods for non experimental causal studies, Review of Economics and Statistics, 84(1), 151-161. Dehejia, R. (2005). Practical Propensity Score Matching: A reply to Smith and Todd, Journal of Econometrics, Vol.125 (1-2), 355-364. DeLong, J.B., & Long, K. (1992). Are all economic hypothesis false?, Journal of Political Economy, 100(6), 1257-1272. Diaz, J.J., & Honda, S. (2006). An assessment of Propensity Score matching as a nonexperimental impact estimator: Evidence from a Mexican Poverty Program, The Journal of Human Resources, 41(2), 319-345. Ditella, R., & Schargrodsky, E. (2005). Do police reduce crime? Estimates using the Allocation of Police forces after a Terrorist Attack, American Economic Review, 94 (1), 115-33. Dubin, J.A., & Rivers, D. (1993). Experimental estimates the impact of wage subsidies, Journal of Econometrics, 56 (1/2), 219-242. Duflo, E. (2001). Schooling and labour market consequences of school construction in Indonesia: Evidence from an unusual policy experiment, American Economic Review, 9(4), 795-813. Duflo, E. (2003). Grandmothers and granddaughters: Old age pension and intra household allocation in South Africa, World Bank Economic Review, 17(1), 1-26. Duflo, E., & Pande, R. (2007). Dams, Quarterly Journal of Economics, 122 (2), 601–646. Filmer, D., & Schady, N. (2009). School Enrollment, Selection and Test Scores, World Bank Policy Research Working Paper 4998, World Bank, Washington DC. Fisher,R.A.(1935). The design of experiments, 8th edition, 1960, New York. Galasso, E., & Rovallion, M. (2004). Social Protection in a crisis: Argentina’s plan Jefes Y Jefas, World Bank Economic Review, 18(3), 367-399. Galbraith,S., Daniel,J.A., & Vissel,B. (2010). A study of clustered data and approaches to its analysis, The Journal of Neuroscience, 30(32), 10601-10608. Galiani, S., Gertler, P., & Schorgrodsky, E. (2005). Water for life: The impact of the Privatisation of Water Services on Child Mortality, Journal of Political Economy, 113(1), 83-119. Gelman, A. (2006). Multilevel (hierarchical) Modelling: What it can and cannot do, Technometric, 48(3). Gertler.P., Martinez.S., Premand.P., Rawling.B. L., & Vermeersch. J.M.C., (2007). Impact evaluation in practice, Washington, DC: World Bank. Gertler,P., Martinez.S., & Vivo,S. (2008). Child-mother provincial investment project plan Nacer, University of California Berkley and World Bank, Washington DC. In Gertler.P., Martinez.S., Premand.P., Rawling.B. L., & Vermeersch. J.M.C., (2007). Impact evaluation in practice, Washington, DC: World Bank. Glazerman, S., Levy, D., & Myers, D. (2003). Nonexperimental replications of Social Experiments: A systematic Review, Mathematica Policy Research,Inc, Princeton,NJ Glewwe, P., Kremer, M., & Moulin, S. (2002). Many children left behind? Text books and Test Scores in Kenya, American Economic Journal: Applied Economics, 1(1), 112135. Glewwe,P., Ilias,N., & Kremer,M. (2003). Teacher incentives, (Harvard University: Working paper No.9671), NBER working paper series Greene, J., Caracelli, V., & Graham W. (1989). Toward a Conceptual Framework for MixedMethod Evaluation Designs”, Educational Evaluation and Policy Analysis, Vol. 11, 255-274. Ham, J., Li, X., & Reagan, P. (2004). Propensity score matching, a distance based measure of migration, and the wage growth of young men, working paper. In Caliendo and Kopeinig (2008). Some practical guidance for the implementation of propensity score matching, Journal of Economic Survey, 22(1), 31-72. Heckman, J., smith, J., & Clements, N. (1997).Making the most out of programme evaluations and social experiments: Accounting for heterogeneity in programme impacts, Review of Economic Studies, 64(4), 487-535. Heckman,J., Ichimura,H., & Todd, P.E.,(1998a).Characterising selection bias using experimental data, Econometrica,66(5), 1017-1098. _______________(1998b).Matching as an Econometric Evaluation Estimator, Review of Economic Studies, April, 261-294. Heckman,James,J., & Vytlacil,E. (2005). Structural equations, Treatment Effects and Economic Policy Evaluations, Econometrica, 73(3), 669-738. Imas, L. G.M., & Rist, R.C. (2009). The road to Results: Designing and conducting Effective Development Evaluations, Washington, DC: World Bank. Johnson, B., & Onwuegbuzie, A. (2004). Mixed Methods Research: A Research Paradigm Whose Time Has Come, Educational Researcher, 33(7), 14-26. Karlan,D., & Zinman,J. (2010). Expanding credit access using randomised supply decisions to estimate the impacts, Review of Financial Studies, Society for Financial Studies, 23(1), 433-464. Kremer,M., Moulin,S., & Namunyu,R., (2002).Decentralization: A cautionary tale, Harvard University. Lalonde, R.J. (1986). Evaluating the econometric evaluations of training programs with experimental data, The Americian Economic Review, 76( 4), 604-620. Lee, B.K., Lessler, J., & Stuart, E.A. (2010). Improving propensity score weighting using machine learning , Statistics in medicine, 29,337-346. Levy, D., & Ohls, J. (2007). Evaluation of Jamaica’s PATH Program: Final Report, Mathematica Policy Research, Inc; Ref. 8966-090, Washington DC. Limieux,T., & Milligun,K. (2005). Incentive Effects of Social Assistance: A Regression Discontinuity Approach, NBER working Paper 10541, National Bureau of Economic Research, Cambridge, MA. Li, F., Song, Y., Yi, H., Zhang, L., & Shi, Y. (2017). The impact of conditional cash transfers on the matriculation of junior high school students into rural China’s high schools, Journal f Development Effectiveness, 9(1), 41-60. Martinez, S.(2004). Pensions, Poverty and Household Investments in Bolivia, University of California, Berkeley, CA. McKienzie,D., Assaf, N., & Cusolito, A.P. (2017). The additionality impact of a matching grant programme for small firms: experimental evidence from Yemen, Journal of Development Effectiveness, 9(1), 1-14. Miguel,E., & Kremer, M.,(2003a). Worms: Identifying Impacts on Education and Health in the presence of Treatment Externalities”, Econometrica. Newman, J., Pradhan, M., Rawlings, L.B., Ridder, G., Coa, R., & Evia, J.L. (2002). An Impact evaluation of Education , Health and Water supply investments by the Bolivian Social Investment Fund, World Bank Economic Review, 16 (2), 241-274. Paxson, C., & Schady, N.R. (2002). The allocation and impact of social funds: Spending on school infrastructure in Peru, World Bank Economic Review, 16, 297–319. Perkins, S.M., Tu,W., Underhill, M.G., Zhou, X., & Murrouy,M. (2000). The use of propensity scores on pharmacoepidemiologic research, Pharmacoepidemiology and Drug Safety, 9(2), 93-101. Pitman, E. J.G. (1937). Significance tests which can be applied to samples from any population III. The analysis of Variance test, Biometrika, 29, 322-335. Pitt,M.M., & Khandker,S.R. (1998). The impact of group based credit programs on poor households in Bangladesh Does gender of participants matter?, Journal of political economy, 100(3). Rao.V., & Woolcock. M., (2003). Integrating qualitative and quantitative approaches in program evaluation. Qualitative and quantitative approaches, Chapter 8:165-190. Raudenbush,S.W., & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analyses methods, Thousand Oaks,CA: Sage publication. Ravallion, M., & Wodon,Q. (2000). Does child labour displace schooling? Evidence on behavioural responses to an enrolment subsidy, Economic Journal, 110, 158-176. Ravallion, M. (2008).Evaluating anti- poverty programs, Handbook of Development Economics, 4, chapter 59, 3788-3840. Rubin, D.B., & Thomas, N. (2000). Combining propensity score matching with additional adjustments for prognostic covariates, Journal of American Statistical Association, 95, 573- 585. Schultz, P.T. (2004). School subsidies for evaluating the Mexican Progresa poverty program, Journal of Development Economics, 74(1), 199-250. Smith, J. & Todd, P. (2005). Does matching overcome Lalonde’s critique of nonexperimental estimators?, Journal of Econometrics, 125(1-2), 305-353. Tashakkori, A., & Teddlie, C. (1998).Mixed Methodology, Combining Qualitative and Quantitative Approaches, Sage, Thousand Oaks. Thabane,L., Mbuagbaw,L., Zhang,S., Samaan, Z., Marcucci,M., Ye, C., & Kosa, D.(2013). A tutorial on sensitivity analyses in clinical trials: The what, when and how, BMC Medical Research methods, 13(1), 92 Vermeersch, C., (2002). School Meals, Educational achievements and school competition: Evidence from a Randomised Experiment, mimeo, Harvard University. White.H. (2002). “Combining quantitative and qualitative approaches in poverty analysis, World development”, 30(3), 511-522. Woodbury, S.A., & Spiegelman, R.G. (1987). Bonuses to Workers and Employers to reduce Unemployment: Randomised Trials in Illinois, American Economic Review, 77(4), View publication stats