Minimo quadrato a due stadi in R

8

Si potrebbe voler essere più specifici quando si dice "probit-least-square-two-stage". Dato che ti riferisci a un programma Stata che implementa questo, suppongo tu stia parlando del pacchetto CDSIMEQ, che implementa la procedura Amemiya (1978) per il modello Heckit (a.k.a Tobia generalizzato, modello Tobit di tipo II, ecc.). Come ha detto Grant, systemfit farà un Tobit per te, ma non con due equazioni. Il pacchetto MicEcon aveva un Heckit (ma il pacchetto si è diviso così tante volte che non so dove sia ora).

Se si desidera quello che il CDSIMEQ fa, può essere facilmente implementato in R. Ho scritto una funzione che replica CDSIMEQ:

tspls <- function(formula1, formula2, data) { 
    # The Continous model 
    mf1 <- model.frame(formula1, data) 
    y1 <- model.response(mf1) 
    x1 <- model.matrix(attr(mf1, "terms"), mf1) 

    # The dicontionous model 
    mf2 <- model.frame(formula2, data) 
    y2 <- model.response(mf2) 
    x2 <- model.matrix(attr(mf2, "terms"), mf2) 

    # The matrix of all the exogenous variables 
    X <- cbind(x1, x2) 
    X <- X[, unique(colnames(X))] 

    J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1)) 
    J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2)) 
    for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1 
    for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1 

    # Step 1: 
    cat("\n\tNOW THE FIRST STAGE REGRESSION") 
    m1 <- lm(y1 ~ X - 1) 
    m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit")) 
    print(summary(m1)) 
    print(summary(m2)) 

    yhat1 <- m1$fitted.values 
    yhat2 <- X %*% coef(m2) 

    PI1 <- m1$coefficients 
    PI2 <- m2$coefficients 
    V0 <- vcov(m2) 
    sigma1sq <- sum(m1$residuals^2)/m1$df.residual 
    sigma12 <- 1/length(y2) * sum(y2 * m1$residuals/dnorm(yhat2)) 

    # Step 2: 
    cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS") 

    m1 <- lm(y1 ~ yhat2 + x1 - 1) 
    m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 
    sm1 <- summary(m1) 
    sm2 <- summary(m2) 
    print(sm1) 
    print(sm2) 

    # Step 3: 
    cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n") 
    gamma1 <- m1$coefficients[1] 
    gamma2 <- m2$coefficients[1] 

    cc <- sigma1sq - 2 * gamma1 * sigma12 
    dd <- gamma2^2 * sigma1sq - 2 * gamma2 * sigma12 
    H <- cbind(PI2, J1) 
    G <- cbind(PI1, J2) 

    XX <- crossprod(X)       # X'X 
    HXXH <- solve(t(H) %*% XX %*% H)   # (H'X'XH)^(-1) 
    HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH 
    Valpha1 <- cc * HXXH + gamma1^2 * HXXH %*% HXXVXXH %*% HXXH 

    GV <- t(G) %*% solve(V0) # G'V0^(-1) 
    GVG <- solve(GV %*% G)  # (G'V0^(-1)G)^(-1) 
    Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG 

    ans1 <- coef(sm1) 
    ans2 <- coef(sm2) 

    ans1[,2] <- sqrt(diag(Valpha1)) 
    ans2[,2] <- sqrt(diag(Valpha2)) 
    ans1[,3] <- ans1[,1]/ans1[,2] 
    ans2[,3] <- ans2[,1]/ans2[,2] 
    ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE) 
    ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE) 

    cat("Continuous:\n") 
    print(ans1) 
    cat("Dichotomous:\n") 
    print(ans2) 
}

Per confronto, si può replicare il campione da parte dell'autore di CDSIMEQ nella loro article about the package.

> library(foreign) 
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta") 
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4, 
+  dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7, 
+  data = cdsimeq) 

     NOW THE FIRST STAGE REGRESSION 
Call: 
lm(formula = y1 ~ X - 1) 

Residuals: 
     Min  1Q Median  3Q  Max 
-1.885921 -0.438579 -0.006262 0.432156 2.133738 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
X(Intercept) 0.010752 0.020620 0.521 0.602187  
Xexog3  0.158469 0.021862 7.249 8.46e-13 *** 
Xexog2  -0.009669 0.021666 -0.446 0.655488  
Xexog1  0.159955 0.021260 7.524 1.19e-13 *** 
Xexog4  0.316575 0.022456 14.097 < 2e-16 *** 
Xexog5  0.497207 0.021356 23.282 < 2e-16 *** 
Xexog6  -0.078017 0.021755 -3.586 0.000352 *** 
Xexog7  0.161177 0.022103 7.292 6.23e-13 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6488 on 992 degrees of freedom 
Multiple R-squared: 0.5972,  Adjusted R-squared: 0.594 
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ X - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49531 -0.59244 0.01983 0.59708 2.41810 

Coefficients: 
      Estimate Std. Error z value Pr(>|z|)  
X(Intercept) 0.08352 0.05280 1.582 0.113692  
Xexog3  0.21345 0.05678 3.759 0.000170 *** 
Xexog2  0.21131 0.05471 3.862 0.000112 *** 
Xexog1  0.45591 0.06023 7.570 3.75e-14 *** 
Xexog4  0.39031 0.06173 6.322 2.57e-10 *** 
Xexog5  0.75955 0.06427 11.818 < 2e-16 *** 
Xexog6  0.85461 0.06831 12.510 < 2e-16 *** 
Xexog7  -0.16691 0.05653 -2.953 0.003152 ** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.14 on 992 degrees of freedom 
AIC: 770.14 

Number of Fisher Scoring iterations: 6 


     NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS 
Call: 
lm(formula = y1 ~ yhat2 + x1 - 1) 

Residuals: 
    Min  1Q Median  3Q  Max 
-2.32152 -0.53160 0.04886 0.53502 2.44818 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
yhat2   0.257592 0.021451 12.009 <2e-16 *** 
x1(Intercept) 0.012185 0.024809 0.491 0.623  
x1exog3  0.042520 0.026735 1.590 0.112  
x1exog2  0.011854 0.026723 0.444 0.657  
x1exog1  0.007773 0.028217 0.275 0.783  
x1exog4  0.318636 0.028311 11.255 <2e-16 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7803 on 994 degrees of freedom 
Multiple R-squared: 0.4163,  Adjusted R-squared: 0.4128 
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49610 -0.58595 0.01969 0.59857 2.41281 

Coefficients: 
       Estimate Std. Error z value Pr(>|z|)  
yhat1   1.26287 0.16061 7.863 3.75e-15 *** 
x2(Intercept) 0.07080 0.05276 1.342 0.179654  
x2exog1  0.25093 0.06466 3.880 0.000104 *** 
x2exog2  0.22604 0.05389 4.194 2.74e-05 *** 
x2exog5  0.12912 0.09510 1.358 0.174544  
x2exog6  0.95609 0.07172 13.331 < 2e-16 *** 
x2exog7  -0.37128 0.06759 -5.493 3.94e-08 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.21 on 993 degrees of freedom 
AIC: 768.21 

Number of Fisher Scoring iterations: 6 

     NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS 

Continuous: 
       Estimate Std. Error t value Pr(>|t|) 
yhat2   0.25759209 0.1043073 2.46955009 0.01369540 
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445 
x1exog3  0.04252006 0.1291588 0.32920764 0.74206810 
x1exog2  0.01185438 0.1290754 0.09184073 0.92684309 
x1exog1  0.00777347 0.1363643 0.05700519 0.95455252 
x1exog4  0.31863627 0.1367881 2.32941597 0.02003661 
Dichotomous: 
       Estimate Std. Error z value  Pr(>|z|) 
yhat1   1.26286574 0.7395166 1.7076909 0.0876937093 
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867 
x2exog1  0.25092561 0.3126763 0.8025092 0.4222584495 
x2exog2  0.22603717 0.2739307 0.8251618 0.4092797527 
x2exog5  0.12911922 0.4822986 0.2677163 0.7889176766 
x2exog6  0.95609385 0.2823662 3.3860070 0.0007091758 
x2exog7  -0.37128221 0.3265478 -1.1369920 0.2555416141

fonte

2010-10-04 15:48:01 eyjo

2

ci sono diversi pacchetti disponibili in R per fare due minimi quadrati di stato. qui sono alcuni

sem: Two-Stage Least Squares
Zelig: link rimosso, non più funzionali (28.07.11)

fatemi sapere se questi servono il vostro scopo.

fonte

2010-10-02 01:25:54 Ramnath

+0

Grazie, ci proverò. Sembra adattarsi ai miei bisogni. –

2

systemfit farà anche il trucco.

fonte

2010-10-02 01:46:16

risposta

Problemi correlati