The rareness of law firms offering analytic value to their law department clients

The 2016 Chief Legal Officer Survey, conducted by Altman Weil, has been discussed here.  That survey included a question about the CLOs primary law firms and what those law firms have shared with them of data analytics.  Specifically, the question asked “Considering the ten law firms that receive the largest portion of your outside counsel spend, in the last 12 months how many of those firms have provided you with an analysis of spending data that was useful to your law department?   Select a number between 0 and 10.”  Page 19 of the report gives the overall results overall and then breaks them down by the number of lawyers in the department.

The chart hereafter shows a breakdown by revenue of the company.  Revenue and number of lawyers are correlated, certainly, but many readers are more familiar with categorizing companies by their revenue.


The situation is dramatic and regrettable.  Almost no chief legal officer in this large sample of 331 (median lawyers nine and median revenue $3.5 billion) has been impressed by what their key law firms have recently shown them on spending data-analysis.  More than half the respondents stated “zero” while 32 did not provide an answer.  One bright spot, however, was the department that claimed that all ten of its key firms had provided valuable data analytics!  For the others, irrespective of the size of the department mostly, on average less than one firm offered analytic value regarding the one area they could do so most easily: their fees and expenses.  Even the largest companies, who are likely to spend millions on law firms and to have large, sophisticated firms representing them, averaged less than 1.5 firms on average.

Law firms that appreciate the value of data-based decision making, that can trawl at least their own figures to draw conclusions about management, and that can help their clients benefit from those insights, will leap ahead of their innumerate competitors.

All sizes of law departments value data analytics approximately the same

We introduced The 2016 Chief Legal Officer Survey, conducted by Altman Weil, Inc. , above.

The survey report asked responding chief legal officers to select from eight efficiency initiatives any they had done recently.  One was “Collection and analysis of management metrics.”   That choice came in fourth as 39% of the respondents who answered the question selected it.

On the downside, however, the next page of the report (pg. 7) reveals that of the eight techniques, data analysis came in last as determined by the percentage of respondents who ranked it as a 9 or 10 on a scale of 10, where 10 meant “enormous value”.


Another view is to look at the relative perceived value of the data analysis efforts by size of company, which is tantamount to size of law department.  The graph above indicates that all sizes of law departments viewed data analytics as roughly offering the same value, albeit not as much value as the other measures.  So, even though as pointed out in previously larger departments exhibit a much higher incidence of using data analytics, all sizes of departments rank the return on that investment as about the same.


The larger the law department, the more likely it undertakes data analysis

Instances of data science in U.S. law firms or law departments beyond the most basic are sparse or at least hard to find out about.  Most of the numbers collected by them are summarized and described only, often by Excel or PowerPoint, and there is very little analysis other than trends over time or rankings.

Because the field of legal data science in support of management decisions is nascent, we have little to go on regarding its development.   One survey that explored the topic is the 2016 Chief Legal Officer Survey, conducted by Altman Weil, Inc. in the Fall of 2016.  This year’s survey attracted 331 participants.  The median law department has nine lawyers while the median corporate revenue is $3.5 billion dollars.  Thus, the survey sample was large and consisted mostly of very large companies.

One question on the Altman Weil survey asked “In the last 12 months, have you done any of the following to increase your law department’s efficiency in its delivery of legal services? (Check all that apply.)”  Of the eight choices, page 6 of the Report shows that “Collection and analysis of management metrics” came in fourth, with 39% of the respondents checking it.

Not surprisingly, when you break the respondents into five revenue categories, as shown in the graphic below, the larger the company, the more likely the respondent checked that selection.  The smaller companies on the left had one out of four, approximately, indicating that they worked with management metrics; the larger companies on the right were more like two out of three selected it.  The inference is that bigger departments have more data and more people or IT resources who can dive into it to help their managers make decisions.



Descriptive analytics compared to predictive analytics

A fundamental distinction between two kinds of data analytics appears in a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016).  The report observes that “Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior.” (page 14).

Law firms and law departments can avail themselves of many kinds of software to summarize aspects of a data set.  Descriptive analytics, as some call it, include averages, medians, quantiles, and standard deviations.  These” summary statistics,” yet another term for the basic calculations, are simplified models of the underlying data.  Note that a “statistic” is a number calculated from underlying data.  So, we calculate the variance statistic of all this year’s invoices where the underlying “raw” data is the data set of all the year’s invoices.

Predictive statistics go farther than descriptive statistics.  Using programs like R and the lm package, you can create a linear regression model that predicts the number of billable hours likely to be recorded by associates based on their practice group, years with the law firm, gender and previous year’s billings, for example.   Predictive analytic models allow the user to forecast numbers.

Limited interviews fall short of “data”; glimmers of awareness of machine learning

Two observations arise from a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016), one about what constitutes “data from a survey” and the other about dawning awareness among general counsel of data analytics.

Regarding the first observation, the report states that its conclusions are based on interviews with 34 “CEOs, Chairmen, General Counsel and Heads of Compliance who made themselves available for interviews and kindly agreed to participate in our research.” (pg. 27).   While you can certainly identify themes from interviews, unless you ask everyone the same question (or some questions), you can’t quantify your findings.  Writing that “risk management is top of mind for GCs” is worlds apart from writing that “Twenty-six out of 34 interviewees mentioned risk management as a significant concern.”  Additionally, surveys are designed to gather data that is representative of a larger population.  It is unlikely that the particular group of 34 who agreed to speak to the KPMG interviewers are representative of the broader population of global CEOs, Chairmen of the Board of Directors, General Counsel or Chief Compliance Officers.  Subjective interpretations of what a limited group of people say falls short of quantified research, although those interpretations have whatever credibility a reader assigns them.

The second observation highlights the passing reference — but at least it is a reference — to machine learning software becoming more known to general counsel.  “Technology was also cited as an important tool to help the GC improve efficiency, at a time when they are continually being asked to do more with less: ‘New technology helps the GC to be more responsive to the real-time demands of the C-suite of executives,’ says the CEO of a large consumer services company. Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior. The Office of the GC is being transformed by this process, for example, when performing due diligence on M&A targets or monitoring global compliance.” (page 14).  The following sentences direct attention to predictive coding in e-discovery, it is true, but at least the report links awareness of predictive analytics to transformation of law departments.

Mandatory annual disclosure of number of lawyers and revenue by U.S. law firms

Why can’t the American Bar Association (or State Bars) require U.S.-based law firms above some modest-sized number of lawyers to report their fiscal year revenue along with a snapshot of the number of partners, associates, and support staff on the last day of the year?  The justification for that disclosure would be that clients, law school graduates or lawyers considering a job change, among others, would have comprehensive and reliable data on at least two key attributes of firms: size and revenue.

Yes, there are definitional issues, such as what does the term “partner” mean in the multi-tiered law firms of today and what makes up “revenue”.   Yes, there might be no way to confirm the accuracy of the self-reported numbers, but law firms that would have to comply have their books audited or reviewed by accountants, and the accountants could to attest to the reasonable accuracy of the four numbers.  Yes, I do not know what enforcement mechanisms might be available.  And yes, firms may fear that the initial data request slips down the proverbial slope to more and more.

Such concerns would need to be debated; they can be resolved.  If firms that have more than 30 lawyers fell under this mandate, then perhaps 1,200 to 1,500 law firms would each year turn in four numbers that they already know.  No work would be required except going to an online site and filling in the numbers.  The ABA or a third party could consolidate and publish that data and the legal industry would be greatly the beneficiary.

Data may be vulnerable, but it helps decision-makers better than ignorance

Someone in a law firm or law department can pick apart any presentation of data.  All numbers, let alone the analyses and presentations of those numbers, are vulnerable to a range of questions such as about their completeness, ambiguity, consistency, and validity.  “Like all statistical measurements, [government data on employment] can be both honest and imprecise; a best estimate given the available tools but nonetheless subject to ambiguity, misinterpretation and error,” points out The NY Times, Nov. 4, 2016 at B4.  The data legal managers should request and absorb before they pull the trigger can also be attacked.

The old saying, “Better to light a candle than curse the darkness” reminds us that any well-intentioned data sheds more light than the total darkness of ignorance, supposition or ideology.  Try to gather numbers that can illuminate some aspect of a decision and you will be better off, even if someone who disagrees with your decision criticizes the data.  The criticisms might be correct and might have to go back and do a better job collecting, defining, parsing or visualizing the underlying data.

But, it is better to have “data’ed and lost than to never have data’ed at all” (sorry, Tennyson).

Ideology and embedded beliefs may cover up management data, but soldier on!

To a data scientist, responsible data informs and guides.  For legal managers, carefully curated data should change your mind if it shows you were unaware, mistaken or had an untenable belief.  Unfortunately, no less than silken data often succumbs to wooly thinking.

This depressing reality, that you can wear yourself out gathering insightful numbers but someone beholden to an ideology or enjoying privileges threatened by that finding will not only reject the insights but will not even acknowledge them.  The NY Times, Nov. 4, 2016 at B4 emphasizes this shortcoming of humans in an article about partisan suspicion of government employment data.  “Decades of psychological research have shown that people … tend to embrace information that confirms their existing beliefs and disregard data that contradicts them.”

When a general counsel is presented with data that shows a favored law firm’s effective billing rate is much higher than the firm’s peers or a managing partner is presented with data that a long-standing client is unprofitable, they can whisk out a sewing basket of tools to rend such “whole cloth” malarkey.  We all find comfort in data that confirms what we believe and we disregard data that controverts our values, belief sets, or sense of self.  We all believe we look good in what we put on.

Even so, data scientists in the data-tattered legal industry must persevere to support the thoughtful pursuit of enlightenment through numbers.

Canonical names to allow software to combine data on law schools

Whenever a data scientist decides to merge two sets of data, there must be a common field (variable) for the software to merge on.  The software needs to be able to instruct the computer “Whenever the first data set has “Alaska” in the State column, and the second data set has “Alaska” in the Jurisdiction column, add on to the first data set any additional columns of information from the data set.”   The code has to tell the software that the State variable and the Jurisdiction variable are the common field for purposes of matching and use the right function for merging.

With the Law School Data Set, when I found data on admission rates in one source and data on numbers of Supreme Court Clerks in another, the common field was the name of the law school.  A human can match school names instantly even if they vary a little in the precise name used.

That sounds like it should also be simple for a computer program, but to a computer “NYU Law” is completely different than “New York University Law”; “Columbia Law School” is not “Columbia School of Law”.  The multitudinous ways publications name law schools means that the data scientist has to settle on one version of the school’s name – sometimes referred to as the “canonical version” – and then spend much time transforming the alternative names to the canonical name.  It’s slogging work, subject to errors, and adds no real value.  But only once it is done can a merge function in the software achieve what you hope.

A publicly-available data set on U.S. law schools

To have a hefty data set that would both interest lawyers and be available to share publicly has long been a desire of mine.   It would let me show how to work with data and readers can download the data and follow along.  While it is easy to make up data for what programmers call “toy data sets”, they are abstract and uninteresting.

Even more importantly, made-up data lacks patterns and characteristics that can demonstrate machine learning capabilities in real life.

My benchmark data from law departments could not be shared, because it was all proprietary.  My data collected during consulting projects for law departments and law firms also has to be kept strictly confidential.  And some data that are in the public domain or have leaked into it, such as older AMLAW100 compilations on law firms, do not have a range of variables that can illustrate machine learning techniques, for example.

So, I created a data set on information about U.S. law schools.  The first version started with the schools rated by U.S. News & World Report.  Thereafter I successively added more data for the schools from about six other sources.  I also added data about the population of the city each school was in and its state, and its state’s number of lawyers in private practice and some other variables about clerkships, etc.

The final step is a coming out party for this set of data about U.S. law schools!