126 extensions to ggplot2 listed: do you know others?

For a book I just published on data graphs for legal managers, available on LeanPub, I compiled as many extension packages and functions as I could locate (on CRAN and github). Here is a link: https://www.dropbox.com/s/ku2gyvlvvq1cyfa/ggExtensions.pdf?dl=0 to a pdf of the results.

If you know of any that I have missed, I would much appreciate hearing from you with a comment. Thank you.

PS If anyone would want to join me in preparing examples of the extensions or functions, somewhat like a CRAN Task View but with vignettes, that might be edifying.

My New Book on Graphical Analysis of Data for Decisions

As readers may know, my interests have broadened from consulting to law departments to applying analytic tools to legal data.  I program with open-source R and am deep into machine learning algorithms as applied to law firm and law department operational data.

My latest book shows how law firms can use their data, when presented effectively in plots, to make better operational decisions.  Law departments, too, can learn from the techniques laid out in the book and they can encourage their firms to improve.

Some 75 types of charts are included, each one showing how different sets of law firm data (65 in all) might be presented.  The book also explains a wide variety of graphing choices and techniques.

The book is available for download in PDF format on LeanPub.

If you have any comments about the book or the topic, I would very much like to hear from you.  Likewise, if you would spread the word about the book in a newsletter, email, blog post, group, webinar, or tweet that would be much appreciated.

Descriptive statistics and the step beyond to predictive statistics

A fundamental distinction between two kinds of data analytics appears in a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016).  “Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior.” (page 14)

Law firms and law departments can avail themselves of many kinds of software to summarize aspects of a data set.  Descriptive statistics, as some call it, include calculating averages, medians, quantiles, and standard deviations.  These summary statistics, yet another term for the basic calculations, are themselves simplified models of the underlying data.  [Note that a “statistic” is properly a number calculated from underlying data.  So, we calculate the variance statistic of all this year’s invoices where the underlying “raw” data is the data set of all the year’s invoices.]

Predictive statistics go farther than descriptive statistics.  Using programs like open-source R and its lm package, you can easily fit a regression model that predicts the number of billable hours likely to be recorded by associates based on their practice group, years with the law firm, gender and previous year’s billings, for example.   Predictive analytic models allow the user to derive numbers, not just describe them

The term “predictive analytics” compared to “machine learning”

The term “machine learning” may be most common, but the alternative “predictive analytics” has much going for it.  This is the term Eric Siegel has promoted extensively, including in his Predictive Analytics (John Wiley 2016).

Siegel places “machine learning” mostly in academia and research papers.  It is a computer science term that connotes statistics and matrix algebra.  His term has more overtones of usefulness to business as it stresses the value of algorithms that take in data and predict numbers or classifications or most-similar values.

Data may be neutral, but interpretation of it is never

Legal managers need to appreciate the gap between numbers and interpretation of those numbers.   Stated differently, contrary to the chestnut “the numbers speak for themselves”, a babble of conclusions can be reached from any set of numeric data.

A charming anecdote from the NY Times, Nov. 4, 2016 at B4, captures the multiple voices of numbers due to the subjectivity of inference.  According to the Times, the Bureau of Labor Statistics has an unofficial motto for when they are asked about their employment data.  They don’t indulge in drawing conclusions as to whether the employment glass is half full or half empty:  they respond, “It’s an eight-ounce glass with four ounces of liquid.”

In a different legal context and drawing on the wisdom of TV, “Just the facts, ma’am” – leave the interpretation to us.

Data scientists can presumably measure the glass and the amount of liquid in it, but managers in law firms and departments must come to their own conclusions about fullness.

The rareness of law firms offering analytic value to their law department clients

The 2016 Chief Legal Officer Survey, conducted by Altman Weil, has been discussed here.  That survey included a question about the CLOs primary law firms and what those law firms have shared with them of data analytics.  Specifically, the question asked “Considering the ten law firms that receive the largest portion of your outside counsel spend, in the last 12 months how many of those firms have provided you with an analysis of spending data that was useful to your law department?   Select a number between 0 and 10.”  Page 19 of the report gives the overall results overall and then breaks them down by the number of lawyers in the department.

The chart hereafter shows a breakdown by revenue of the company.  Revenue and number of lawyers are correlated, certainly, but many readers are more familiar with categorizing companies by their revenue.

aw-clo-2016-law-firm-data-analysis-pg-19

The situation is dramatic and regrettable.  Almost no chief legal officer in this large sample of 331 (median lawyers nine and median revenue $3.5 billion) has been impressed by what their key law firms have recently shown them on spending data-analysis.  More than half the respondents stated “zero” while 32 did not provide an answer.  One bright spot, however, was the department that claimed that all ten of its key firms had provided valuable data analytics!  For the others, irrespective of the size of the department mostly, on average less than one firm offered analytic value regarding the one area they could do so most easily: their fees and expenses.  Even the largest companies, who are likely to spend millions on law firms and to have large, sophisticated firms representing them, averaged less than 1.5 firms on average.

Law firms that appreciate the value of data-based decision making, that can trawl at least their own figures to draw conclusions about management, and that can help their clients benefit from those insights, will leap ahead of their innumerate competitors.

All sizes of law departments value data analytics approximately the same

We introduced The 2016 Chief Legal Officer Survey, conducted by Altman Weil, Inc. , above.

The survey report asked responding chief legal officers to select from eight efficiency initiatives any they had done recently.  One was “Collection and analysis of management metrics.”   That choice came in fourth as 39% of the respondents who answered the question selected it.

On the downside, however, the next page of the report (pg. 7) reveals that of the eight techniques, data analysis came in last as determined by the percentage of respondents who ranked it as a 9 or 10 on a scale of 10, where 10 meant “enormous value”.

aw-clo-2016-data-analysis-pg-7

Another view is to look at the relative perceived value of the data analysis efforts by size of company, which is tantamount to size of law department.  The graph above indicates that all sizes of law departments viewed data analytics as roughly offering the same value, albeit not as much value as the other measures.  So, even though as pointed out in previously larger departments exhibit a much higher incidence of using data analytics, all sizes of departments rank the return on that investment as about the same.

 

The larger the law department, the more likely it undertakes data analysis

Instances of data science in U.S. law firms or law departments beyond the most basic are sparse or at least hard to find out about.  Most of the numbers collected by them are summarized and described only, often by Excel or PowerPoint, and there is very little analysis other than trends over time or rankings.

Because the field of legal data science in support of management decisions is nascent, we have little to go on regarding its development.   One survey that explored the topic is the 2016 Chief Legal Officer Survey, conducted by Altman Weil, Inc. in the Fall of 2016.  This year’s survey attracted 331 participants.  The median law department has nine lawyers while the median corporate revenue is $3.5 billion dollars.  Thus, the survey sample was large and consisted mostly of very large companies.

One question on the Altman Weil survey asked “In the last 12 months, have you done any of the following to increase your law department’s efficiency in its delivery of legal services? (Check all that apply.)”  Of the eight choices, page 6 of the Report shows that “Collection and analysis of management metrics” came in fourth, with 39% of the respondents checking it.

Not surprisingly, when you break the respondents into five revenue categories, as shown in the graphic below, the larger the company, the more likely the respondent checked that selection.  The smaller companies on the left had one out of four, approximately, indicating that they worked with management metrics; the larger companies on the right were more like two out of three selected it.  The inference is that bigger departments have more data and more people or IT resources who can dive into it to help their managers make decisions.

aw-clo-2016-data-analysis-pg-6

 

Descriptive analytics compared to predictive analytics

A fundamental distinction between two kinds of data analytics appears in a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016).  The report observes that “Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior.” (page 14).

Law firms and law departments can avail themselves of many kinds of software to summarize aspects of a data set.  Descriptive analytics, as some call it, include averages, medians, quantiles, and standard deviations.  These” summary statistics,” yet another term for the basic calculations, are simplified models of the underlying data.  Note that a “statistic” is a number calculated from underlying data.  So, we calculate the variance statistic of all this year’s invoices where the underlying “raw” data is the data set of all the year’s invoices.

Predictive statistics go farther than descriptive statistics.  Using programs like R and the lm package, you can create a linear regression model that predicts the number of billable hours likely to be recorded by associates based on their practice group, years with the law firm, gender and previous year’s billings, for example.   Predictive analytic models allow the user to forecast numbers.

Limited interviews fall short of “data”; glimmers of awareness of machine learning

Two observations arise from a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016), one about what constitutes “data from a survey” and the other about dawning awareness among general counsel of data analytics.

Regarding the first observation, the report states that its conclusions are based on interviews with 34 “CEOs, Chairmen, General Counsel and Heads of Compliance who made themselves available for interviews and kindly agreed to participate in our research.” (pg. 27).   While you can certainly identify themes from interviews, unless you ask everyone the same question (or some questions), you can’t quantify your findings.  Writing that “risk management is top of mind for GCs” is worlds apart from writing that “Twenty-six out of 34 interviewees mentioned risk management as a significant concern.”  Additionally, surveys are designed to gather data that is representative of a larger population.  It is unlikely that the particular group of 34 who agreed to speak to the KPMG interviewers are representative of the broader population of global CEOs, Chairmen of the Board of Directors, General Counsel or Chief Compliance Officers.  Subjective interpretations of what a limited group of people say falls short of quantified research, although those interpretations have whatever credibility a reader assigns them.

The second observation highlights the passing reference — but at least it is a reference — to machine learning software becoming more known to general counsel.  “Technology was also cited as an important tool to help the GC improve efficiency, at a time when they are continually being asked to do more with less: ‘New technology helps the GC to be more responsive to the real-time demands of the C-suite of executives,’ says the CEO of a large consumer services company. Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior. The Office of the GC is being transformed by this process, for example, when performing due diligence on M&A targets or monitoring global compliance.” (page 14).  The following sentences direct attention to predictive coding in e-discovery, it is true, but at least the report links awareness of predictive analytics to transformation of law departments.