Quality of a linear regression model: the F-test statistic

One more nugget gleams from a linear regression model: the F-statistic.   That statistic compares your model to a model that has no predictors.  The stripped-down model relies only on the average of the response variable (for us, the average number of private practice lawyers in a state).

Stated differently, the F-test statistic is a ratio: the top of the ratio (the numerator) is the variance in the estimated response variables explained by your model’s predictor variables. That figure is divided by bottom of the ratio (the denominator), the variance of the stripped-down model. This is not quite correct, because the F-statistic also takes account of degrees of freedom, but it is close enough for us.

Each combination of an F-test statistic and an arbitrary number of degrees of freedom corresponds to a p-value. The statistic and its p-value are to the overall regression model much the same as the t-statistic and its p-value are to each coefficient estimate.  However, while each t-statistic value is associated with a specific p-value, the F-test statistic p-value depends on both the test statistic and the number of degrees of freedom. The fewer the degrees of freedom, the higher the F-test statistic needs to be in order to return the same p-value.

If an F-test statistic is statistically significant, it implies that all the predictor variables together explain the response variable to a degree you can rely on in the eyes of a statistician.

While another statistic we wrote about, R-squared, estimates the strength of the relationship between your model’s predictors and the response variable, it does not provide a formal hypothesis test for the relationship, a core statistical concept which we will consider later.  The F-test statistic does so. If the p-value for the F-test statistic is less than your significance level, such as 0.05, you can conclude that R-squared is statistically significantly different from zero.

With a model that uses only two predictor variables (we used companies with fewer than 500 employees and total enrollment in top-100 law schools), the F-test statistic is highly significant because its p-value falls much less than 0.05. We can be quite confident that the model explains the variability of the dependent variable (practicing lawyers) around its average far better than using just the average itself.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.