ECON7035 & 8040 LEC7: INDICATOR (DUMMY) VARIABLES WWLJ Chapter 8: 8.1 to 8.4 HGL Chapter 7: 7.1, 7.2, 7.3.1 Learning objectives Describing qualitative information A single dummy independent variable Using dummy variables for multiple categories Interaction involving dummy variables 2 Describing qualitative information Qualitative information: Comes in the form of binary information Example 1: A person is male or female Example 2: A student attends a private or public school A way to incorporate qualitative information is to use dummy variables (i.e. binary variables or zero–one variables). Example: female is a dummy variable, which takes the value of 1 if the student is female, and zero otherwise. They may appear as the dependent or as independent variables. In this chapter we only consider dummy varaibles as explanatory variables 3 Single dummy independent variable How is a binary information incorporated into regression model? β1 + δ1 female + β 2 educ + e wage = = the wage gain/loss if the person is a woman rather than a man (holding other things fixed) Dummy variable: =1 if the person is a woman =0 if the person is man δ1 is the difference in hourly wage between females and males given the same amount of education. δ1 = E ( wage / female = 1, educ) − E ( wage / male = 1, educ) E ( wage / female, educ) − E ( wage / male, educ) 4 Single dummy independent variable Alternative interpretation of coefficient δ1 = the difference in mean wage between men and women with the same level of education. wage men: wage= β1 + β 2 educ women: wage = ( β1 + δ1 ) + β 2 educ slope=β2 β1 β1 + δ1 Intercept shift educ 5 Dummy variable trap wage = β1 + γ 1male + δ1 female + β 2 educ + e This model cannot be estimated perfect collinearity as female + male = 1 male is a perfect linear function of female Dummy variable trap 6 Base group or benchmark group When using dummy variables, one category always has to be omitted: β1 + δ1 female + β 2 educ + e wage = wage = β1 + γ 1male + β 2 educ + e The base category are men The base category are women Alternatively, one could omit the intercept: wage = γ 1male + δ1 female + β 2 educ + e Disadvantages: 1. More difficult to test for differences between parameters 2. R-squared formula only valid if regression contains intercept. 7 Estimated wage equation with intercept shift (wages.xlsx) Holding education and experience fixed, women earn $6.37 less per hour than men. Does that mean that women are discriminated against? We can conclude that the $6.37 differential cannot be explained by different levels of education and experience between men and women and is due to gender or factors associated with gender which have not been controlled for in the regression. 8 Program Evaluation Example: effects of accessing online subject materials on assignment mark (AS1.XLSX) Assignment mark Dummy indicating whether student reviewed online material This is an example of program evaluation: Treatment group (= reviewed online) vs. control group (= not reviewed) This equation implies that a student who reviews the online materials prior to handing in the assignment has a predicted assignment mark about 1 point higher on average than a comparable (in terms of mid term test marks and the number of tutes attended) student who did not review the material. 9 Using a dummy independent variable to account for outliers (gfc.xlsx) Outliers can occur when sampling from a population. May be different in some relevant aspect from rest of population. Example: Did countries that adopted large fiscal packages outperform those that didn’t? Where the IMF forecast errors variable is defined as the excess of 2009 actual GDP growth over the IMF predicted GDP growth. We expect Greece, Hungary, Iceland and Ireland to be outliers due to their fiscal circumstances. We discard these four observations. 10 Using a dummy independent variable to account for outliers (cont.) It was found that a positive and significant relationship existed between fiscal stimulus and performance. Create a dummy variable ‘DUM’ for observations that are Greece, Hungary, Iceland and Ireland. The dummy variable is not significant. There is not a strong case to treat the observations on Greece, Hungary, Iceland and Ireland differently. 11 Interpreting coefficients on dummy explanatory variables when the dependent variable is log(y) Example: housing price regression (hprice1.xlsx) Dummy indicating whether house is of colonial style As the dummy for colonial style changes from 0 to 1, the house price increases by 5.4 percentage points approximately. The coefficient on a dummy variable when multiplied by 100 is interpreted as the percentage difference in y holding all other factors constant. 12 Using dummy variables for multiple categories Example: demand for Fords across LGAs (local government areas) in Australia (PC_CARS.XLSX) 1. Define membership in each category by a dummy variable. 2. Leave out one category (which becomes the base category). Dummy variables are circled in red. Holding other things fixed, the proportion of Fords registered in Victoria is 15.4% higher than in WA (= the base category). Where m_fam is average family income and nemp is the unemployment rate in the LGA. 13 Incorporating ordinal information using dummy variables Example: city credit ratings on government bond interest rates GBR =+ β1 β 2CR + other factors Government bond rate Credit rating from 0–4 (0=worst, 4=best) This specification would probably not be appropriate, as the credit rating only contains ordinal information. A better way to incorporate this information is to define dummies: GBR = β1 + δ1CR1 + δ 2CR2 + δ 3CR3 + δ 4CR4 + other factors Dummies indicating whether the particular rating applies; e.g. CR1=1 if CR=1 and CR1=0 otherwise. All effects are measured in comparison to the worst rating (= base category). 14 Interactions among dummy variables Examine the interaction term between female and married in the log wage model. We can examine the wages difference between single and married females and males respectively. We can test the null hypothesis that the gender differential does not depend on marital status. The ‘marriage premium’ for a female is .113-.145=-.032 (about minus 3.2%) and for a male .113 (about 11.3%) 15 Allowing for different slopes Allowing for different slopes Interaction term log( wage) = β1 + δ1 female + β 2 educ + δ 2 female × educ + u β1 = intercept men β1 +δ1= intercept women β2 = slope men β2 +δ2 = slope women Interesting hypotheses H0 : δ2 = 0 The return to education is the same for men and women. H0 : δ1 = δ2 = 0 The whole wage equation is the same for men and women. 16 Allowing for different slopes (cont.) Interacting both the intercept and the slope with the female dummy enables one to model completely independent wage equations for men and women. 17 Example: log hourly wage equation The estimated return to education for women is 0.088 + 0.00005 The estimated return to education for men is 8.8% No evidence against the hypothesis that the return to education is the same for men and women since the coefficient on the interaction term is insignificant. 18 Example: log hourly wage equation Does this mean that there is no significant evidence of lower pay for women at the same levels of educ and exper? No: when we introduced female.educ the SE of the coefficient of female increased from 0.027 to 0.152 due to multicollinearity between female and female.educ . 19 Testing for differences in regression functions across groups Unrestricted model (contains full set of interactions) Assignment mark = 1 if viewed online material = 0 otherwise Student‘s mark in test assignmark = β1 + δ1 female + β 2View + δ 2 female × View + β3midtest + δ 3 female × midtest + β 4 numtutes + δ 4 female × numtutes + u Number of tutorials attended Restricted model (same regression for both groups) assignmark = β1 + β 2View + β3midtest + β 4 numtutes + u 20 Testing for differences in regression functions across groups (cont.) Null hypothesis H0= : δ1 0,= δ 2 0,= δ 3 0,= δ4 0 Estimation of the unrestricted model All interaction effects are zero, i.e. the same regression coefficients apply to men and women. Tested individually, the hypothesis that the interaction effects are zero cannot be rejected. 21 Joint test with F statistic F SSRr − SSRur ) / q [1291.56 − 1208.102] / 4 (= = SSRur / (n − k − 1) 1208.102 / 215 3.71 Alternative way to compute F statistic in the given case Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions. Run a regression for the restricted model and store SSR. If the test is computed in this way, it is called the Chow test. Important: test assumes a constant error variance across groups. 22 Mar-09 Jun-08 Sep-07 Dec-06 Mar-06 Jun-05 Sep-04 Dec-03 Mar-03 Jun-02 Sep-01 Dec-00 Mar-00 Jun-99 Sep-98 Dec-97 Mar-97 Jun-96 Sep-95 Dec-94 Mar-94 Jun-93 Sep-92 Dec-91 Mar-91 Jun-90 Sep-89 Dec-88 7,000.00 Mar-88 Jun-87 Sep-86 Dec-85 Mar-85 Jun-84 Sep-83 CONTROLLING FOR SEASONALITY & EVENTS IN TIME SERIES DATA (1) Seasonal Dummies Quarterly Retail Turnover: Department Stores 6,000.00 5,000.00 4,000.00 3,000.00 2,000.00 1,000.00 0.00 28 Define the seasonal dummy variables as follows. D1t = 1 if t is first quarter, = 0 otherwise; D2t = 1 if t is second quarter, = 0 otherwise; D3t = 1 if t is third quarter, = 0 otherwise; and D4t = 1 if t is fourth quarter, = 0 otherwise. In Gretl click on Add and choose Periodic Dummies to create the seasonal dummies. The model can be specified as yt = β1 + δ2D2t + δ3D3t + δ4D4t + β2Xt + εt • Reference category = first quarter, • δk = difference between the intercepts of quarter k and the first quarter or yt = δ1D1t + δ2D2t + δ3D3t + δ4D4t + β2Xt + εt • No reference category, • δk = the intercepts of quarter k 29