Descriptive statistics and the step beyond to predictive statistics

A fundamental distinction between two kinds of data analytics appears in a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016).  “Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior.” (page 14)

Law firms and law departments can avail themselves of many kinds of software to summarize aspects of a data set.  Descriptive statistics, as some call it, include calculating averages, medians, quantiles, and standard deviations.  These summary statistics, yet another term for the basic calculations, are themselves simplified models of the underlying data.  [Note that a “statistic” is properly a number calculated from underlying data.  So, we calculate the variance statistic of all this year’s invoices where the underlying “raw” data is the data set of all the year’s invoices.]

Predictive statistics go farther than descriptive statistics.  Using programs like open-source R and its lm package, you can easily fit a regression model that predicts the number of billable hours likely to be recorded by associates based on their practice group, years with the law firm, gender and previous year’s billings, for example.   Predictive analytic models allow the user to derive numbers, not just describe them

Descriptive analytics compared to predictive analytics

A fundamental distinction between two kinds of data analytics appears in a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016).  The report observes that “Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior.” (page 14).

Law firms and law departments can avail themselves of many kinds of software to summarize aspects of a data set.  Descriptive analytics, as some call it, include averages, medians, quantiles, and standard deviations.  These” summary statistics,” yet another term for the basic calculations, are simplified models of the underlying data.  Note that a “statistic” is a number calculated from underlying data.  So, we calculate the variance statistic of all this year’s invoices where the underlying “raw” data is the data set of all the year’s invoices.

Predictive statistics go farther than descriptive statistics.  Using programs like R and the lm package, you can create a linear regression model that predicts the number of billable hours likely to be recorded by associates based on their practice group, years with the law firm, gender and previous year’s billings, for example.   Predictive analytic models allow the user to forecast numbers.

Surveys with fewer than 400 participants produce “ballpark” results at best

Findings from surveys can enlighten legal managers and sharpen their decisions, but only if the data reported by the organization that conducted the survey is credible.  Among the many imperfections that can mar survey results, an immediately obvious one is sample size and its inverse effect on the margin of error of the results.  Put simply, the smaller the sample of respondents, the more the results might diverge from the actual figure that would emerge if all the population could be polled – the margin of error balloons.  Or, lots of participants, small margin of error (results more likely to be representative of the whole population).

The NY Times, Oct. 15, 2016 at A15 refers to voter surveys, but the statistical caveat is the same for legal-industry surveys.  “If the sample is less than 400, the result should be considered no more than a ballpark estimate.”

Sadly, many surveys by vendors to law firms and law departments fail to accumulate more than 400 participants.  Worse, quite a few survey reports say nothing about how many participants they obtained, even if they provide demographic data about them.  Their findings might be characterized as SWAGs (scientific wild-ass guesses), which might even then be giving them too much credit on the “scientific” side.  No one should base decisions  derived from a too-tiny  group of survey respondents.

We leave for another post a further wrinkle that the Times highlights: if the data analysts weight the responses, they “don’t adjust their margins of error to account for the effect of weighting.”