After fitting a regression model, analysts look closely at how far the model’s estimates lie from the actuals. Such deviations, which we met before as residuals, permit several calculations that can assess the accuracy of the model.

The residuals (represented by the error term in a regression equation) represent unsystematic deviations from the ‘ground truth’ response values, the bouncing around of their estimates, the noise. The residuals average to zero because of the way the regression calculations are done, meaning they range on both sides of the best-fit line so some are negative and some are positive, but overall they cancel each other out. Every observation (U.S. state) has a residual because the model estimates each state’s number of practicing lawyers, which can be compared to the real number. The gaps constitute the residuals.

**Ordinary least squares** (OLS) regression squares the residuals and then minimizes the sum of those squares, thereby setting equation and the best-fit line.

Back to assessing a model’s accuracy. Software can calculate the \underline{square root} of the \underline{average} of those squared residuals: that square root divided by the number of observations is the **Root Mean Squared Error** (RMSE).

So, what’s the big deal? The deal is that the RMSE lets you compare the accuracy of linear regression models that have different mixes of predictor variables (but not between data sets, as it is scale-dependent). In general, a lower RMSE is better than a higher one. For an example, the three predictors of Less500, enrollment and urbanpct yield a RMSE for practicing lawyers of 4,708. If we drop urbanpct, the RMSE rises a tiny amount — to 4,713 — meaning that the smaller model is slightly less good at predicting practicing lawyers accurately.

Here’s the deeper deal. Recall that Adjusted R-squared measures the amount of variance in the response variable (practicing lawyers in a state) that can be explained by our model. It gives one view of the quality of the regression model.

Sometimes, however, we may be more interested in quantifying the residuals in the same measuring unit as the response variable. We want a figure for the plus-or-minus range of estimated practicing lawyers in a state. We could consider the average of the residuals of the model, except that the linear regression residuals always average zero. So we need to find other ways to quantify the residuals; the RMSE does the trick and it has the same measuring unit of the response variable, lawyers.

Think of the RMSE as a measure of the width of the data cloud around the line of perfect prediction. The effect of each residual on RMSE is proportional to the size of its square; thus larger residuals have a disproportionately large effect on RMSE. Consequently, and problematically, RMSE is sensitive to outliers.

To solve the problem with large residuals that might skew the RMSE we can use the **Mean Absolute Error** (MAE), the average of the absolute value of the residuals (the negative of a number and the positive of a number are treated the same with the absolute value of that number). This measurement is more robust against, less susceptible to, large residuals because we are not squaring them. MAE is also probably easier to understand than the square root of the average of squared residuals. It is usually similar in magnitude to RMSE, but slightly smaller.