# Being sure about confidence intervals

Using our data, we we can be 95\% confident that the true change in practicing lawyers for adding or subtracting one more student in a top-100 law school in the state lies between 2.5 and 7.8 lawyers. For a given confidence level (such as 95%), a narrower interval indicates a more precise estimate, whereas a broader interval indicates less precision. When the high boundary of the confidence interval is multiples of the low boundary, we are less sure of the association of the predictor and the response variable.

Note something else: because the confidence interval for percentage of the state’s population living in an urban area contains zero, a change in it is insufficiently related to practicing lawyer counts, statistically speaking, holding the other predictors constant. In other words, if the change might be zero, there is no statistically meaningful effect by that predictor. On the other hand, the law school enrollment interval not straddling zero, that predictor has a statistically significant p-value.

A confidence interval is an interval of good estimates of the unknown true population parameter.

Here is a plot with confidence intervals around the best-fit line of the Less500 predictor (companies with fewer than 500 employees). The intervals show as the shaded portions above and below the line. You can be 95 percent confident that the vertical range contains the true number of private practice lawyers for a state with the corresponding predictor value on the horizontal axis. If the predictor is indeed associated with the response variable, the more data a plot has in an area, the narrower the confidence interval, as you are more and more sure of the estimate.

One more aspect. We should explain how statisticians use the terms population and sample. The entire set of what you would like to count is the population; the portion of the population that you obtain is a sample from that population. So, for example, all the 45 associates in a firm comprise a population; the 15 selected at random to take a survey would be a sample from the population.

Statistics offers an impressive toolbox for making inferences from a partial sample to the entire population — and stating how likely those inferences are correct.

If we repeatedly sampled from the larger population (different mixes of 15 associates each time), the confidence intervals would contain the true population mean of whatever we are estimating from the linear regression. In other words, there is a 95% chance of selecting a sample such that the 95% confidence intervals calculated from that sample contain the correct mean for the response variable.

The confidence level does not express the chance that repeated sample estimates will fall into the confidence interval. Nor does it give the probability that the unknown mean for the response variable lies within the confidence interval.

This site uses Akismet to reduce spam. Learn how your comment data is processed.