Equivalence of MLE and OLS in Linear Regression
Introduction
A linear regression model can be described using $Y = \alpha + \beta X + \epsilon$. This simplifies to the following form on the observed data: $y=\alpha + \beta + e \tag{Equation 1}$. After obtaining the maximum likelihood estimate for the coefficients using the sample, this equation simplifies to $\hat{y} = \hat\alpha + \hat\beta x$. The objective of this (short) article is to use the assumptions to establish the equivalence of OLS and MLE solutions for linear regression.
Important Model Assumptions
- True underlying distribution of the errors is Gaussian
- Expected value of the error term is 0 (known)
- Variance of the error term is constant with respect to x
- The ‘lagged’ errors are independent of each other
Full Likelihood
For an observation e from a Gaussian distribution with 0 mean and constant variance, the likelihood is given by $L(e|\alpha, \beta) = \frac{exp(-\frac{e^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}$. Given the whole data set of n observations, assuming the residues are realizations of the iid (independent and identically distributed) Gaussian error, the likelihood can be written as:
\[L(\vec{e}|\alpha,\beta)=\prod_{i=1}^{n}\frac{exp(\frac{-e_i^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}=\frac{exp(-\frac{\sum_{i=1}^{n}e_i^2}{2\sigma^2})}{(\sqrt{2\pi\sigma^2})^n}\]Since log is a monotonous transformation, the maximum likelihood estimate does not change on log transformation:
\[l(\vec{e}|\alpha,\beta)=log(L(\vec{e}|\alpha,\beta))=-\frac{\sum_{i=1}^{n}e_i^2}{2\sigma^2}-\frac{n}{2}(log(2\pi) + 2log(\sigma))\]Substituting the maximum likelihood estimate:
\[\hat\alpha, \hat\beta = argmax_{\alpha,\beta} l(\vec{e}|\alpha,\beta) = argmax_{\alpha,\beta}\bigg[-\frac{\sum_{i=1}{n}e_i^2}{2\sigma^2}\bigg] - \frac{n}{2}(log(2\pi) + 2log(\sigma))\]Removing the constant terms:
\[\hat\alpha, \hat\beta = argmax_{\alpha,\beta} \sum_{i=1}^{n}-e_i^2\]Substituting $e$ from equation 1, we get:
\[\hat\alpha, \hat\beta = argmax_{\alpha,\beta} \sum_{i=1}^{n}-(y-\beta x -\alpha)^2\]Maximizing -z is equivalent to minimizing z, therefore $ \hat\alpha, \hat\beta = argmin_{\alpha,\beta} \sum_{i=1}^{n} (y-\beta x -\alpha)^2$
All Assumptions
- Relationship between independent variable and dependent variables is linear
- True underlying distribution of the error is Gaussian with 0 mean
- Independent variables do not exhibit high level of multicollinearity
- No autocorrelation: ‘lagged’ error terms are independent
- No heteroskedasticity (already used): variance of the error is independent of X and is constant throughout
- Multivariate normality of independent variables (not required, but helpful) for proving few special properties
- The independent variables are measured without random error. Therefore, X and x are not random
Additional Resources
- Equivalence of ANOVA and linear regression
- Simple physics for an intuitive understanding of linear regression