Ejemplos Variable Dependiente Limitada

Anuncio
UCEMA ECONOMETRIA APLICADA - ProbitEl archivo MROZ.xls tiene datos de un un estudio clásico donde se trata de estimar una
ecuación de salarios para mujeres. La primer etapa del estudio es determinar la
probabilidad de participación de la mujer en el mercado laboral.
Estime un modelo probit de selección y discuta la especificación y resultados.
La data basica es:
THE MROZ DATA FILE IS TAKEN FROM THE 1976 PANEL STUDY OF INCOME
DYNAMICS, AND IS BASED ON DATA FOR THE PREVIOUS YEAR, 1975. OF THE
753 OBSERVATIONS, THE FIRST 428 ARE FOR WOMEN WITH POSITIVE HOURS
WORKED IN 1975, WHILE THE REMAINING 325 OBSERVATIONS ARE FOR WOMEN WHO
DID NOT WORK FOR PAY IN 1975. A MORE COMPLETE DISCUSSION OF THE DATA
IS FOUND IN MROZ [1987], APPENDIX 1. THE DATA SET CONSISTS OF 753
OBSERVATIONS ON 19 VARIABLES:
1. LFP
A dummy variable = 1 if woman worked in 1975, else 0
2. WHRS Wife's hours of work in 1975
3. KL6
Number of children less than 6 years old in household
4. K618 Number of children between ages 6 and 18 in household
5. WA
Wife's age
6. WE
Wife's educational attainment, in years
7. WW
Wife's average hourly earnings, in 1975 dollars
8. RPWG Wife's wage reported at the time of the 1976 interview (not the same
as the 1975 estimated wage). To use the subsample with this wage, one needs
to select 1975 workers with LFP=1, then select only those women with non-zero
RPWG. Only 325 women work in 1975 and have a non-zero RPWG in 1976.
9. HHRS Husband's hours worked in 1975
10. HA
Husband's age
11. HE
Husband's educational attainment, in years
12. HW
Husband's wage, in 1975 dollars
13. FAMINC Family income, in 1975 dollars. This variable is used to
construct the property income variable.
14. MTR
This is the marginal tax rate facing the wife, and is taken from
published federal tax tables (state and local income taxes are excluded). The
taxable income on which this tax rate is calculated includes Social Security,
if applicable to wife.
15. WMED Wife's mother's educational attainment, in years
16. WFED Wife's father's educational attainment, in years
17. UN
Unemployment rate in county of residence, in percentage points.
This taken from bracketed ranges.
18. CIT
Dummy variable = 1 if live in large city (SMSA), else 0
19. AX
Actual years of wife's previous labor market experience
. gen p rin c= (fa min c-w w* whr s)/ 100 0
. pro bi t l fp wa kl 6 k 61 8 w e p rin c
It era ti on
It era ti on
It era ti on
It era ti on
0:
1:
2:
3:
log
log
log
log
li ke lih ood -514.8732
=
li ke lih ood -455.5718
=
li ke lih ood-454.22718
=
li ke lih ood-454.22535
=
Pr obi t reg re ssi on
Num be r o f o bs
=
753
L R c hi 2(
5)
=
121.30
P rob > ch i2
=
0.0000
P seu do R2
=
0.1178
Lo g l ik eli ho od -454.22535
=
l fp
wa
kl6
k 618
we
pr inc
_c ons
Coe f.
-.0344388
-.8921749
-.0377001
.1558378
-.0209234
.4224
S td . E rr .
.0075772
.1144317
.0404322
.0239
.0045833
.4726394
z
-4.55
-7.80
-0.93
6.52
-4.57
0.89
P> |z|
0.000
0.000
0.351
0.000
0.000
0.371
[95 % Con f. Int er val ]
-.0492898
-1.116457
-.1169457
.1089946
-.0299066
-.5039563
-.0195878
-.6678929
.0415455
.2026809
-.0119403
1.348756
ECONOMETRIA APLICADA
UCEMA
Ejemplos Estimación con Truncamiento y Censura
Usamos datos de un ejemplo de Mroz
THE MROZ DATA FILE IS TAKEN FROM THE 1976 PANEL STUDY OF INCOME
DYNAMICS, AND IS BASED ON DATA FOR THE PREVIOUS YEAR, 1975. OF THE
753 OBSERVATIONS, THE FIRST 428 ARE FOR WOMEN WITH POSITIVE HOURS
WORKED IN 1975, WHILE THE REMAINING 325 OBSERVATIONS ARE FOR WOMEN WHO
DID NOT WORK FOR PAY IN 1975. A MORE COMPLETE DISCUSSION OF THE DATA
IS FOUND IN MROZ [1987], APPENDIX 1. THE DATA SET CONSISTS OF 753
OBSERVATIONS ON 19 VARIABLES:
1. LFP
A dummy variable = 1 if woman worked in 1975, else 0
2. WHRS Wife's hours of work in 1975
3. KL6
Number of children less than 6 years old in household
4. K618 Number of children between ages 6 and 18 in household
5. WA
Wife's age
6. WE
Wife's educational attainment, in years
7. WW
Wife's average hourly earnings, in 1975 dollars
8. RPWG Wife's wage reported at the time of the 1976 interview (not the same
as the 1975 estimated wage). To use the subsample with this wage, one needs
to select 1975 workers with LFP=1, then select only those women with non-zero
RPWG. Only 325 women work in 1975 and have a non-zero RPWG in 1976.
9. HHRS Husband's hours worked in 1975
10. HA
Husband's age
11. HE
Husband's educational attainment, in years
12. HW
Husband's wage, in 1975 dollars
13. FAMINC Family income, in 1975 dollars. This variable is used to
construct the property income variable.
14. MTR
This is the marginal tax rate facing the wife, and is taken from
published federal tax tables (state and local income taxes are excluded). The
taxable income on which this tax rate is calculated includes Social Security,
if applicable to wife.
15. WMED Wife's mother's educational attainment, in years
16. WFED Wife's father's educational attainment, in years
17. UN
Unemployment rate in county of residence, in percentage points.
This taken from bracketed ranges.
18. CIT
Dummy variable = 1 if live in large city (SMSA), else 0
19. AX
Actual years of wife's previous labor market experience
Para ilustrar el problema de truncamiento podemos estimar un modelo donde las horas
trabajadas sean función de kl6, k618, wa y we.
Si ignoramos el problema de truncamiento y estimamos por OLS:
regress whrs kl6 k618 wa we if whrs>0
podemos reestimar usando el comando truncreg teniendo en cuenta que hay
observaciones en cero
truncreg whrs kl6 k618 wa we if whrs, ll(0)
Los resultados de la estimación por OLS están sesgados hacia abajo. No pueden usarse
para hacer inferencia.
Censura: Tobit
Podemos estimar un modelo para explicar el salario anual en función de wa kl6 k618 we
Primero calculamos el salario anual
gen wwa=(ww*whrs)
reemplazamos los missings por cero
replace wwa=0 if ww==0
regress ww wa kl6 k618 we
Luego estimamos como un tobit, especificando que lww está censurada en cero ll(0)
tobit wwa wa kl6 k618 we, ll(0)
Podemos calcular los efectos marginales sobre probabilidad de tener un salario positivo
mfx compute, predict (pr(0,.))
Luego podemos calcular el efecto de cada variable en el log del salario dado que el
individuo no ha sido censurado
mfx compute, predict (e(0,.))
ECONOMETRIA APLICADA
UCEMA
Estimacion de una ecuacion de salarios con sesgo de selección
Método de Heckman
El archivo tiene datos de un ejemplo de Mroz, un estudio clásico donde se trata de
estimar una ecuación de salarios para mujeres
La primer etapa sería un probit de selección y la segunda un ols
La data basica es:
THE MROZ DATA FILE IS TAKEN FROM THE 1976 PANEL STUDY OF INCOME
DYNAMICS, AND IS BASED ON DATA FOR THE PREVIOUS YEAR, 1975. OF THE
753 OBSERVATIONS, THE FIRST 428 ARE FOR WOMEN WITH POSITIVE HOURS
WORKED IN 1975, WHILE THE REMAINING 325 OBSERVATIONS ARE FOR WOMEN WHO
DID NOT WORK FOR PAY IN 1975. A MORE COMPLETE DISCUSSION OF THE DATA
IS FOUND IN MROZ [1987], APPENDIX 1. THE DATA SET CONSISTS OF 753
OBSERVATIONS ON 19 VARIABLES:
1. LFP
A dummy variable = 1 if woman worked in 1975, else 0
2. WHRS Wife's hours of work in 1975
3. KL6
Number of children less than 6 years old in household
4. K618 Number of children between ages 6 and 18 in household
5. WA
Wife's age
6. WE
Wife's educational attainment, in years
7. WW
Wife's average hourly earnings, in 1975 dollars
8. RPWG Wife's wage reported at the time of the 1976 interview (not the same
as the 1975 estimated wage). To use the subsample with this wage, one needs
to select 1975 workers with LFP=1, then select only those women with non-zero
RPWG. Only 325 women work in 1975 and have a non-zero RPWG in 1976.
9. HHRS Husband's hours worked in 1975
10. HA
Husband's age
11. HE
Husband's educational attainment, in years
12. HW
Husband's wage, in 1975 dollars
13. FAMINC Family income, in 1975 dollars. This variable is used to
construct the property income variable.
14. MTR
This is the marginal tax rate facing the wife, and is taken from
published federal tax tables (state and local income taxes are excluded). The
taxable income on which this tax rate is calculated includes Social Security,
if applicable to wife.
15. WMED Wife's mother's educational attainment, in years
16. WFED Wife's father's educational attainment, in years
17. UN
Unemployment rate in county of residence, in percentage points.
This taken from bracketed ranges.
18. CIT
Dummy variable = 1 if live in large city (SMSA), else 0
19. AX
Actual years of wife's previous labor market experience
Además de variables generadas
El ingreso no generado por la mujer en la flia
gen princ=(faminc-ww*whrs)/1000
La edad al cuadrado:
gen wasq=wa*wa
La experiencia al cuadrado
gen axsq=ax*ax
El log del salario
gen lww=log(ww)
Todas las variables estan en la base
**La ecuacion de seleccion es un probit:
probit lfp princ we ax axsq wa kl6 k618
**La de salarios un OLS con el IMR (hazard) para corregir el sesgo
regress lww we ax axsq imr
**A partir del probit se puede calcular la inversa del ratio de Mills (IMR), o como lo
llama tambien stata el Non selection hazard
H(z) = f(z) / (1 − F(z)),
(donde f() es la density y F() es la acumulada, evaluadas en cada punto de la muestra y
el z es el forecast en indice del probit)
El Mills es 1 / H(z),
Entonces la inversa del Mills (IMR) es el hazard
Todo esto se puede hacer junto con el comando heckman en dos etapas
La sintaxis es:
heckman lww we ax axsq, twostep select(lfp=princ we ax axsq wa kl6 k618 ) first
nshazard(imr)
La expresión es bastante intuitiva:
- primero esta el ols
- despues el two step indica que es por dos etapas (no por max verosimilitud que es la
opción por defecto)
- luego la ecuacion de selección entre paréntesis (el probit)
- el first es para reportar la primera etapa (el probit)
- el nshazard lo guarda en la variable imr
Para confirmar el procedimiento, es posible hacer primero un probit y luego el ols
usando el imr como variable adicional.
Para calcular el IMR “a mano” partiendo del Probit
Estimar el probit
probit lfp princ we ax axsq wa kl6 k618
predecir el valor del “indice” (los xb)
predict lfp_index, xb
Generar el IMR como cociente de la función de densidad (f()) y la acumulada (F())
gen imr=(normalden(lfp_index))/(normal(lfp_index))
o alternativamente (deberia dar lo mismo por las propiedades de las func de densidad y
acum):
gen imr=(normalden(-lfp_index))/(1-(normal(-lfp_index)))
correr la regresion utilizando el imr como regresor adicional
regress lww we ax axsq imr
****
Descargar