Machine-learning statistics: variance and standard deviation

It would be nice if lawyers could embrace regression without a grasp of its statistical underpinnings. If partners or associate general counsel are content with arm-waving, vague notions of regression, that is a choice. But if they want to take part in discussions about regression and feel assured that they understand what it can and cannot do, they should learn some statistical concepts. Broadly stated, that is the goal of this blog: to make machine learning software comprehensible to lawyers.

Specifically, this post explains two statistical stalwarts — variance and standard deviation — that play roles explicitly or implicitly in many other posts.

Variance is a statistical measure of how far the numbers in a collection of numbers are scattered from the collection’s average. It tells you about the collection’s degree of dispersion.

Take a law firm that has many offices, but does environmental work in only four of them. In those four offices, the firm has one environmental partner, three environmental partners, five, and seven respectively. The variance of that collection of numbers (1, 3, 5, 7) is 6.67.

If we were to calculate the variance by hand we would start with the average number of partners in the offices (sum 1, 3, 5, and 7 and divide by 4). The sum being 16 across the four offices, the average is four per office. Next, we would subtract that average, four, from each office’s number of partners. We would then square the result of the subtraction for each office (multiply the result by the result), and add up all of those squared numbers (1-4 squared = 9; 3-4 squared = 1, plus 1, plus 9). Finally, we would divide that total (20) by the number of offices minus 1 (3).

A single command in statistical analysis software such as R does all this instantly: the variance is 6.67 (squared partners).

Now, what if instead the largest environmental office has 11 partners instead of seven. Intuitively, you should sense that the variance would be larger, because there is a wider spread in the set of partner numbers (1, 3, 5, 11). You would be right! The variance of this set of partners is 18.67 (squared partners). Larger variances represent greater dispersion.

Most people find it easier to think about dispersion measures when they are expressed in the same units as the data rather than in squared units. Here, partners holds meaning more comfortably than squared partners (whatever that is!).

To convert variance to the original units, you find its square root, the number which multiplied by itself equals the variance. That figure is the standard deviation of the collection of partner numbers. The square root of the first example of offices, which has a variance of 6.67, is 2.58 (2.58 times 2.58 = 6.67, with rounding); the square root of the second example, with the larger variance of 18.67, is 4.32.

A way to put the standard deviation into context is to compare it to the average of the numbers. So, in the first example of offices the standard deviation is approximately 2.6 while the average is 4 (the standard deviation is 65% of the average); in the second example, because the fourth office has 11 partners instead of 7, the standard deviation rises to 4.3 while the average increases to 5. Now the standard deviation is 86% of the average, so it confirms a much more varied collection of partner numbers.

What most people are familiar with is the standard deviation of a bell-shaped distribution. It represents about 68% of the numbers in the set. Thus, a bit more than two-thirds of all the numbers fall within one standard deviation above and one standard deviation below the average. Two standard deviations on either side of the average covers around 95% of the values. Bear in mind, however, that most distributions of numbers do not exhibit a so-called normal distribution (we will return to this importance concept later), so standard deviation can’t be translated into such neat percentages. What you should understand is that the larger the standard deviation (relative to the average), the greater the dispersion among the numbers and the less precise of a measurement the average represents.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.