Linear Models Faculty of Science, 2010-2011 Juan M. Rodríguez Díaz Statistics Department University of Salamanca Spain PartII: Regression Models Contents 1 Introduction 3 2 Linear Regression 2.1 Preliminaries and notation 2.2 Estimation . . . . . . . . 2.3 Tests and residuals . . . . 2.4 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 6 6 3 Multiple Regression 3.1 Introduction . . . . . 3.2 Estimation and tests 3.3 Prediction . . . . . . 3.4 Other issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 9 9 . . . . . . . . . . . . 4 Analysis of Covariance 12 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 ANOCOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5 Extensions of Regression Model 2 14 1 Introduction Regression Models A bit of history • The regression models try to find the dependence of a (vector of) response variable y with respect to a (vector of) explicative variable x • First used in Astronomy and Physics by Laplace and Gauss • The name comes after Galton (ending of S. XIX), who studied the dependence of the children’s height respect to the parents’ height, finding a “regression” to the mean Regression models Prediction • Let us assume we know the distribution of heights y for the Spanish population, and we want to predict the height ŷ of a randomly-selected person. We would select – the mode in order to maximize the prob. of guess right – the median if minimizing the absolute value of the error – the mean if minimizing the sum of squares of the errors • Let us assume, additionally, that we know the weight x of the person and the (conditioned) distribution of the heights for this weight • Then we will estimate the unknown height by ȳ|x , the mean of the heights conditioned to the weight x Regression models Types of relation between variables • Exact or functional: the knowledge of a variable x determines completely the value of the other, y = f (x) • Independence: the knowledge of x does not give any information about y • Statistic or stochastic: the knowledge of a variable x can predict (better or worse) the value of the other, y = f (x) + ε The regression methods try to build models for this third case 3 2 Linear Regression 2.1 Preliminaries and notation Linear Regression Model • Model: yi = β0 + β1 xi + ui with ui , independent and ui ≡ N (0, σ 2 ), i = 1, . . . , n • ei = yi − ŷi = yi − β̂0 − β̂1 xi 2.2 Estimation Linear Regression Estimators of parameters Pn • Minimum Squares estimators minimize i e2i • Maximum Likelihood estimators maximize L(β) = log fn (e), Pm with fn (e) = (2π)−n/2 σ −n exp[−( i ei )/(2σ 2 )] • ⇒ Under the normality assumption, MS estimators coincide with ML estimators: ) ( P ⇔ e = 0 0 = ∂L(β) i ∂β0 Normal equations • P i 0 = ∂L(β) i ei xi = 0 ∂β1 ⇔ • β̂1 = Sxy 2 Sx β̂0 = ȳ − β̂1 x̄ ; P with Sxy = n1 i (xi − x̄)(yi − ȳ) = xy ¯ − x̄ȳ, P Sx2 = n1 i (xi − x̄)2 = x¯2 − x̄2 4 Linear Regression Properties of line and estimators • Estimated Line: ŷ = β̂0 + β̂1 x or (y − ȳ) = β̂1 (x − x̄) P 2 • β̂1 =P i ωi yi , with x̄)/nS P 2ωi = (xi − P x 2 and ωi = 0, ωi = 1/nSx , ωi xi = 1 P • β̂0 = P with i )/n i ri yi , Pri2= (1 − nx̄ω P and ri = 1, ri = (1 + x̄2 /Sx2 )/n, ri xi = 0 σ2 • β̂1 ≡ N β1 , nS 2 x 2 • β̂0 ≡ N β0 , σn 1+ x̄2 2 Sx = σ 2 x¯2 2 nSx 2 x̄σ • Cov(β̂0 , β̂1 ) = − nS 2 x Linear Regression Some notes • The point (x̄, ȳ) belongs to the estimated line • About the two regression lines yx and xy : – The line yx minimizes the vertical distances of the points to it – The line xy minimizes the horizontal ones – None of them is the line closest to the points – The sign of the slope is the same in both lines – They intersect in the point (x̄, ȳ) • From Cov(β̂0 , β̂1 ), when x̄ > 0 they are negative correlated • The more distant from the mean is a point, the more importance it has in the estimation of β1 • Var(β̂0 ) increases with the distance of x̄ to the origin 5 2.3 Tests and residuals Linear Regression Tests • The regression test is an ANOVA P P P 2 2 2 • i (yi − ȳ) = i (yi − ŷi ) + i (ŷi − ȳ) • N EV σ2 ≡ χ2n−2 ; EV σ2 ≡ χ21 if H0 is true Source EV NEV TV (TV=NEV+EV) SS P i (ŷi 2 − ȳ) 2 P i ei 2 i (yi − ȳ) P Determination coefficient: R2 = EV TV DF MS F 1 Se2 2 SR Se2 2 SR n−2 n−1 2 Radj =1− 2 SR Sc2y 2 R2 = rxy = 2 Sxy 2 S2 Sx y Linear Regression Plots of residuals • The plot ei vs. ŷi should be a shapeless “cloud of points” • Otherwise it could mean – a non-linear relation between the variables – existence of outliers – very different number of observations per group (x) – heteroscedasticity – computational errors • The plot ei vs t (when possible) can detect dependence among the residuals – positive autocorrelation – negative autocorrelation 2.4 Prediction Linear Regression Prediction • Given x we may want to predict – the value of Y when X = x (one observation, yx ) – the mean of the observations taken at x, µx ) • In both cases the predicted value is the one given by the estimated line, µ̂x = ŷx = β̂0 + β̂1 x • But the variances are different: 2 2 – Var(µ̂x )=Var(ȳ + β̂1 (x − x̄))= σn 1 + (x−x̄) 2 Sx 2 2 – Var(ŷx )=E[(yx − ŷx )2 ]= σn n + 1 + (x−x̄) 2 S x • Now confidence intervals for µx and yx can be constructed 6 3 Multiple Regression 3.1 Introduction Multiple Regression Linear Model Notation • Model: yi = β0 + β1 x1i + β2 x2i + · · · + βk xki + ui where xij could be any function (linear or not) of the x, and with the usual assumptions for the uncontrolled variables: ui ≡ N (0, σ 2 ), i = 1, . . . , n, and independent 1 x11 xk1 Y = (y1 , . . . , yn )0n×1 1 x12 xk2 0 β = (β0 , β1 , . . . , βk )(k+1)×1 X= • Matrix notation: 0 U = (u1 , . . . , un )n×1 1 x1n xkn • Thus the model can be written: a space of dimension n Y = Xβ + U n×(k+1) , where Y belongs to Multiple Regression Linear Model Gauss-Markov Theorem In the given model, if the following conditions are true • The values of the dependent variable are generated by the linear model Y = Xβ + U • the {ui } are uncorrelated • Var(ui )=constant • the {ui } are not dependent on the x • x is obtained without measure errors • we look for estimators that are linear respect to the observations and we consider the optimal estimator to be the one that is unbiased and with minimum variance, then the MSE is optimal, no matter the distribution of u. Note: Remember that when the normality assumption holds MLE coincides with MSE. 3.2 Estimation and tests Multiple Regression Linear Model Estimators of parameters • Under the normality assumption, MS estimators coincide with ML estimators: Pn Pn • G = i e2i = i (yi − β0 − β1 x1i − · · · − βk xki )2 P ∂G 0 = ∂β ⇔ i ei = 0 0 P ∂G 0 = ∂β ⇔ e x = 0 i 1i i 1 • ← Normal equations that can be ... P ∂G 0 = ∂β ⇔ i ei xki = 0 k expressed as X 0 Y = X 0 X β̂ ⇒ β̂ = (X 0 X)−1 X 0 Y 7 Multiple Regression Linear Model Geometric interpretation • Let X0 =10n , X1 = (x11 , . . . , x1n )0 , ..., Xk = (xk1 , . . . , xkn )0 the columns of X, and EX =< X0 , X1 , . . . , Xk > the subspace of Rn generated by them • The residuals vector Û = Y − X β̂ = Y − Ŷ = e will be minimum when it is orthogonal to EX , that is when 0 = X 0 e = X 0 (Y − Ŷ ) ⇔ β̂ = (X 0 X)−1 X 0 Y • And since ΣY = σ 2 I and ΣAY = AΣY A0 then Σβ = σ 2 (X 0 X)−1 Multiple Regression Linear Model Properties of estimators • ŷ = β̂0 + β̂1 x1 +· · ·+ β̂k xk ŷ − ȳ = β̂1 (x1 − x̄1 )+· · ·+ β̂k (xk − x̄k ) or • β̂i ≡ N (βi , σ 2 qii ) , with qii the (i, i) element of (X 0 X)−1 ⇒ test and confidence interv. for parameters can be computed P 2 e 2 i i = n−k−1 is an unbiased estim. of σ 2 (Residual Variance) • SR ) (β̂−β)0 X 0 X(β̂−β) 0 ≡ χ2k+1 X 0 X(β̂−β) 2 σ ⇒ (β̂−β) ≡ Fk+1,n−k−1 • 2 2 (k+1)SR (n−k−1)SR 2 ≡ χn−k−1 σ2 Multiple Regression Linear Model Properties of estimators • The confidence region is the ellipsoid 2 (β̂ − β)0 X 0 X(β̂ − β) ≤ (k + 1)SR Fk+1,n−k−1 whose axes depend on the characteristic roots of X 0 X 8 Multiple Regression Linear Model Tests Sources of variability P P P 2 2 2 i (yi − ȳ) = i (yi − ŷi ) + i (ŷi − ȳ) (TV = EV + NEV) Source EV i (ŷi NEV TV R2 = and 3.3 F = EV TV SS P , − ȳ) 2 2 i ei P 2 Radj =1− R2 n−k−1 , 1−R2 k MS F k Se2 2 SR Se2 2 SR n−k−1 n−1 2 i (yi − ȳ) P DF 2 SR Sc2y k = R2 − (1 − R2 ) n−k−1 , 2 2 ) SR = Sc2y (1 − Radj Prediction Multiple Regression Linear Model Prediction • Given x = (1, x1 , . . . , xk )0 we may want to predict – the value of Y when X = x (one observation, yx ) – the mean of the observations taken at x, µx ) • In both cases the predicted value is the one given by the estimated line, µ̂x = ŷx = β̂0 + β̂1 x1 + · · · + β̂k xk • But the variances are different: – Var(µ̂x )=E[(µ̂x − µx )2 ]=x0 Var(β̂)x = σ 2 x0 (X 0 X)−1 x | {z } νx 2 2 – Var(ŷx )=E[(yx − ŷx ) ]=σ (1 + νx ) • Using this, confidence regions for µx and yx can be constructed 3.4 Other issues Multiple Regression Linear Model Multicolinearity • If one explanatory variable is a linear combination of the rest, then X 0 X cannot be inverted and the solution is not unique • Close to this case the covariance matrix σ 2 (X 0 X)−1 is ’big’ ⇒ great variances and covariances between parameters • It does not affect to ŷ or e • It could happen that none of the parameters is significative, but the model is • It could affect very much to some parameters and nothing to others • Big MCE 9 Multiple Regression Linear Model Multicolinearity indexes • Increment Variance Factor (IVF): increment of the estimator variance in multiple regression respect to that in simple regression, 2 IV F (i) = (1 − Ri.R )−1 i = 1, . . . , p , 2 where Ri.R = determination coefficient of the regression of xi respect to x1 , . . . , xbi , . . . , xp • IV F close to 1 means low multicolinearity 2 • Tolerance=(IV F )−1 = 1 − Ri.R • If R = (rxi xj )(i,j) , R−1 measures the joint dependence: {R−1 }(i,i) = IV F (i) • The characteristic roots ofp X 0 X, or R (better) measure it: Conditioning Index (CI)= max.eigenvalue/min.eigenvalue • Medium multicolinearity ≡ 10 < IC ≤ 30 • Note: IC(A) = IC(A−1 ) Multiple Regression Linear Model Multicolinearity treatment • A-priori design, aiming a ’big’ X 0 X • When the a-priori design is not possible: – Remove regressors – Include external information (e.g. Bayes) – Main components Multiple Regression Linear Model Partial Correlation • Given the variables x1 , . . . , xk , the partial correlation coefficient between x1 and x2 is a coefficient that excludes the influence of the rest of variables, r12.34···k . Procedure: – e1.34···k , residuals of x1 respect to x3 , . . . , xk – e2.34···k , residuals of x2 respect to x3 , . . . , xk – r12.34···k is the correlation coefficient between e1.34···k and e2.34···k • The partial correlation coefficient between y and xi is ryi.R such that 2 ryi.R = t2i t2i +n−k−1 • For 3 variables: rxy.z = q , with ti = β̂i /σ̂β̂i rxy − rxz ryz 2 )(1 − r 2 ) (1 − rxz yz 10 Multiple Regression Linear Model Stepwise regression • Forward: adding variables. Procedure: – begin with the most correlated with the dependent variable – the rest use the partial correlation excluding the influence of the ones in the model – the procedure stops when none of the remaining variables has a significative correlation coefficient and high tolerance • Backward: Begin with all the variables and remove one in each step • Mixture of For and Backward • Blocks: using blocks of variables that go in/out together 11 4 Analysis of Covariance 4.1 Introduction Analysis of Covariance Introduction • Very often the relationship between variables depend on qualitative variables • Usually, when data can be grouped, the ’group factor’ should be included in the model • Thus we will consider linear models with explanatory variables qualitative (factors) and quantitative (covariates) Analysis of Covariance Fictitious variables • When the groups (A and B) are not taken into account, the following situations could arise – Specification error due to the omission of a factor – Not detect a true relation – Decide that a false relation is significative • Solutions to this problem – Make separate regressions in the different groups – Include the factor in the model: with ZA = 1 0 y = β0 + β1 x + αzA + u when in group A , fictitious variable when not in A 12 4.2 ANOCOVA Analysis of Covariance Models • About the explanatory variables, let us assume that we have – One qualitative variable with p levels ⇒ p − 1 fictitious variables Z1 , ..., Zp−1 – k quantitative variables X1 , ..., Xk • Models 1. Y = Xβ + U ≡ y = β0 + β1 x1 + · · · + βk xk + u ≈ groups have no influence 2. Y = Xβ + Zα + U ≡ y = β0 + β1 x1 + · · · + βk xk + z1 α1 + · · · + zp−1 αp−1 + u ≈ groups with same slope but different intersect 3. Yi = Xi bi + εi ≡ yi = βi0 + βi1 x1 + · · · + βik xk + εi i = 1, . . . , p ≈ different slope and intercept in each group Analysis of Covariance Tests • H01,3 : Groups are not influent F 1,3 = (SSRes1 − SSRes3 ) /(p − 1)(k + 1) EV1→3 /DF1−3 = N EV /DFN E SSRes3 /[n − p(k + 1)] The rejection of H01,3 means that Model 1 is poor, it doesn’t explain enough, groups should be taken into account ⇒ the right model will be either Model 2 or Model 3 We need to perform another test to decide between these two: • H02,3 : There is no interaction between factors and regressors F 2,3 = EV2→3 /DF2−3 (SSRes2 − SSRes3 ) /k(p − 1) = N EV /DFN E SSRes3 /[n − p(k + 1)] A significative result means that Model 2 does not explain enough there exists interaction between factors and regressors, the regression lines are different in the different groups ⇒ we need Model 3 13 5 Extensions of Regression Model • Qualitative response variable – Regression model – Generalized linear models: Logit, Probit • Polynomial models – One explanatory variable – Response surfaces • Recursive estimation 14