XT3. Hausman 检验

整理人： 冯超楠 (北京航空航天大学)
邮箱： fengcnhpy@126.com

常用的面板数据模型有固定效应模型和随机效应模型，由于这两个模型的设定思想和形式有所差异，因此估计结果是不同的，本节介绍的 Hausman 检验有助于判断哪个模型更为适用，主要涉及三个问题：Hausman 检验的基本思想、Stata 实现过程以及实际过程中 Hausman 检验值为负怎么办。

1. Hausman 检验的基本思想

我们先回顾一下固定效应模型和随机效应模型的模型设定：

\[ FE:\ y_{it}=\color{Blue}X_{i t}^{\prime} \beta\color{Black}+\color{Blue}a_{i}\color{Black}+\color{Black}\varepsilon_{i t}\quad (1) \]

\[ RE:\ y_{it}=X_{i t}^{\prime} \beta\color{Black}+\color{Red}\alpha_{i}\color{Black}+\color{Red}\varepsilon_{i t}\color{Black}\quad (2) \]

为了便于说明，此处用蓝色标记解释变量，红色标记干扰项。

固定效应模型 (FE) 将“个体效应” \(\color{Blue}a_{i}\) 视为解释变量的一部分。因此，为了得到 \(\beta\) 的无偏估计，需要假设干扰项 \(\varepsilon_{i t}\) 与 \(\color{Blue}X_{i t}\) 和 \(\color{Blue}a_{i}\) 不相关，即：

\(\mathrm{H}_{1}: \mathrm{E}(\varepsilon_{it}\,|\,X_{it}, \alpha_{\mathrm{i}})=0\)

随机效应模型 (RE) 将“个体效应” \(\color{Red}\alpha_{i}\) 视为干扰项的一部分。为了得到 \(\beta\) 的无偏估计，需要假设干扰项 \(u_{it} = \color{Red}\alpha_{i} + \color{Red}\varepsilon_{i t}\) 与 \(X_{i t}\) 不相关，即：

\(\mathrm{H}_{2}: \mathrm{E}(\varepsilon_{it}\,|\,X_{it})=0\)
且 \(\mathrm{H}_{3}: \mathrm{E}(\alpha_{i}\,|\,X_{it})=0\)

可以看出，FE 和 RE 的核心差别在于“个体效应” \(\alpha_{i}\) 是否与解释变量 \(X_{it}\) 相关，即 \(\mathrm{H}_{3}\) 是否成立。

\(\mathrm{H}_{3}\)	FE	RE
成立	无偏	无偏 + 有效
不成立	无偏	有偏

简要说明如下：

若 \(\mathrm{H}_{3}\) 成立，则 FE 和 RE 估计量都是无偏的，但 RE 更有效；
若 \(\mathrm{H}_{3}\) 不成立，则 FE 仍然是无偏的，但 RE 有偏(不能用)；

因此，Hausman 检验思路是：完成 FE 和 RE 的估计后，对比 FE 和 RE 的估计结果。若没有明显差异，则认为 \(\mathrm{H}_{3}\) 成立，否则，拒绝 \(\mathrm{H}_{3}\)。

2. Stata 实例

我们使用一个典型面板数据 nlswork.dta，包含 4711 个妇女，15 个时间跨度，研究目标为妇女工资受什么因素影响，因此回归中被解释变量为工资对数值 ln_wage，解释变量分别为年龄 age、工作经验 ttl_exp 以及任职长短 tenure

help hausman

webuse "nlswork.dta", clear    
xtset idcode year

xtreg ln_wage age ttl_exp tenure, fe
est store fe

xtreg ln_wage age ttl_exp tenure, re
est store re

hausman fe re

输出结果如下：

. hausman fe re

          ---- Coefficients ----                                  
        |    (b)          (B)        (b-B)    sqrt(diag(V_b-V_B))
        |     fe           re     Difference      Std. err.
--------+---------------------------------------------------------
    age | -.0030427    -.0050184    .0019757       .0005064
ttl_exp |  .029036      .0338343   -.0047983       .000826
 tenure |  .0116574     .0127792   -.0011218       .0003144
------------------------------------------------------------------
               b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.

Test of H0: Difference in coefficients not systematic

    chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
            = 323.40
Prob > chi2 = 0.0000

Hausman 检验的结果显示，检验统计量为 \(323.40\)，对应的 p 值为 \(0.0000\)。由于 p 值小于 0.05，我们拒绝原假设 H0，即固定效应模型和随机效应模型之间的系数差异是系统性的。因此，基于 Hausman 检验的结果，我们应选择 固定效应模型（FE），因为它在此数据集上比随机效应模型更合适。

3. hausman 检验值为负怎么办？

3.1 问题背景

我们要研究的数据样本 invest2.dta 包含 100 个观察值：5 家公司，每家公司有 20 年资料，涉及公司 id、时间 t、投资支出 invest、市场价值 market、资本存量 stock 五个变量。有多种方法估计投资支出 invest、市场价值 market、资本存量 stock 三个变量之间的关系，假设我们想研究投资支出、资本存量对市场价值的影响，则在回归模型中被解释变量为市场价值 market，解释变量为投资支出 invest、资本存量 stock。分别使用固定效应模型、随机效应模型进行估计，进而通过 hausman 命令检验其差异。

use invest2.dta, clear
xtdes

xtreg market invest stock, fe
est store m_fe

xtreg market invest stock, re
est store m_re

hausman m_fe m_re

输出结果为：

. hausman m_fe m_re

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |      m_fe         m_re        Difference       Std. err.
-------------+----------------------------------------------------------------
      invest |     3.05273     3.847014        -.794284               .
       stock |   -.6763434    -.7981618        .1218184               .
------------------------------------------------------------------------------
                          b = Consistent under H0 and Ha; obtained from xtreg.
           B = Inconsistent under Ha, efficient under H0; obtained from xtreg.

Test of H0: Difference in coefficients not systematic

chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
        = -47.57

Warning: chi2 < 0 ==> model fitted on these data
         fails to meet the asymptotic assumptions
         of the Hausman test; see suest for a
         generalized test.

3.2 原因及解决方法

参见如下推文：

游万海, 连玉君, 2020, Stata: 面板数据模型一文读懂, 连享会 No.122.