|Why can’t the American Bar Association (or State Bars) require U.S.-based law firms above some modest-sized number of lawyers to report their fiscal year revenue along with a snapshot of the number of partners, associates, and support staff on the last day of the year? The justification for that disclosure would be that clients, law school graduates or lawyers considering a job change, among others, would have comprehensive and reliable data on at least two key attributes of firms: size and revenue.
Yes, there are definitional issues, such as what does the term “partner” mean in the multi-tiered law firms of today and what makes up “revenue”. Yes, there might be no way to confirm the accuracy of the self-reported numbers, but law firms that would have to comply have their books audited or reviewed by accountants, and the accountants could to attest to the reasonable accuracy of the four numbers. Yes, I do not know what enforcement mechanisms might be available. And yes, firms may fear that the initial data request slips down the proverbial slope to more and more.
Such concerns would need to be debated; they can be resolved. If firms that have more than 30 lawyers fell under this mandate, then perhaps 1,200 to 1,500 law firms would each year turn in four numbers that they already know. No work would be required except going to an online site and filling in the numbers. The ABA or a third party could consolidate and publish that data and the legal industry would be greatly the beneficiary.
|Someone in a law firm or law department can pick apart any presentation of data. All numbers, let alone the analyses and presentations of those numbers, are vulnerable to a range of questions such as about their completeness, ambiguity, consistency, and validity. “Like all statistical measurements, [government data on employment] can be both honest and imprecise; a best estimate given the available tools but nonetheless subject to ambiguity, misinterpretation and error,” points out The NY Times, Nov. 4, 2016 at B4. The data legal managers should request and absorb before they pull the trigger can also be attacked.
The old saying, “Better to light a candle than curse the darkness” reminds us that any well-intentioned data sheds more light than the total darkness of ignorance, supposition or ideology. Try to gather numbers that can illuminate some aspect of a decision and you will be better off, even if someone who disagrees with your decision criticizes the data. The criticisms might be correct and might have to go back and do a better job collecting, defining, parsing or visualizing the underlying data.
But, it is better to have “data’ed and lost than to never have data’ed at all” (sorry, Tennyson).
|To a data scientist, responsible data informs and guides. For legal managers, carefully curated data should change your mind if it shows you were unaware, mistaken or had an untenable belief. Unfortunately, no less than silken data often succumbs to wooly thinking.
This depressing reality, that you can wear yourself out gathering insightful numbers but someone beholden to an ideology or enjoying privileges threatened by that finding will not only reject the insights but will not even acknowledge them. The NY Times, Nov. 4, 2016 at B4 emphasizes this shortcoming of humans in an article about partisan suspicion of government employment data. “Decades of psychological research have shown that people … tend to embrace information that confirms their existing beliefs and disregard data that contradicts them.”
When a general counsel is presented with data that shows a favored law firm’s effective billing rate is much higher than the firm’s peers or a managing partner is presented with data that a long-standing client is unprofitable, they can whisk out a sewing basket of tools to rend such “whole cloth” malarkey. We all find comfort in data that confirms what we believe and we disregard data that controverts our values, belief sets, or sense of self. We all believe we look good in what we put on.
Even so, data scientists in the data-tattered legal industry must persevere to support the thoughtful pursuit of enlightenment through numbers.
|Whenever a data scientist decides to merge two sets of data, there must be a common field (variable) for the software to merge on. The software needs to be able to instruct the computer “Whenever the first data set has “Alaska” in the State column, and the second data set has “Alaska” in the Jurisdiction column, add on to the first data set any additional columns of information from the data set.” The code has to tell the software that the State variable and the Jurisdiction variable are the common field for purposes of matching and use the right function for merging.
With the Law School Data Set, when I found data on admission rates in one source and data on numbers of Supreme Court Clerks in another, the common field was the name of the law school. A human can match school names instantly even if they vary a little in the precise name used.
That sounds like it should also be simple for a computer program, but to a computer “NYU Law” is completely different than “New York University Law”; “Columbia Law School” is not “Columbia School of Law”. The multitudinous ways publications name law schools means that the data scientist has to settle on one version of the school’s name – sometimes referred to as the “canonical version” – and then spend much time transforming the alternative names to the canonical name. It’s slogging work, subject to errors, and adds no real value. But only once it is done can a merge function in the software achieve what you hope.
|To have a hefty data set that would both interest lawyers and be available to share publicly has long been a desire of mine. It would let me show how to work with data and readers can download the data and follow along. While it is easy to make up data for what programmers call “toy data sets”, they are abstract and uninteresting.
Even more importantly, made-up data lacks patterns and characteristics that can demonstrate machine learning capabilities in real life.
My benchmark data from law departments could not be shared, because it was all proprietary. My data collected during consulting projects for law departments and law firms also has to be kept strictly confidential. And some data that are in the public domain or have leaked into it, such as older AMLAW100 compilations on law firms, do not have a range of variables that can illustrate machine learning techniques, for example.
So, I created a data set on information about U.S. law schools. The first version started with the schools rated by U.S. News & World Report. Thereafter I successively added more data for the schools from about six other sources. I also added data about the population of the city each school was in and its state, and its state’s number of lawyers in private practice and some other variables about clerkships, etc.
The final step is a coming out party for this set of data about U.S. law schools!
|The legendary Prof. Edward Tufte gave a keynote presentation in September 2016 at Microsoft’s Machine Learning and Data Summit. Tufte’s ambitious subject was “The Future of Data Analysis”. You can listen to the 50-minute talk online. Early on he emphasized that you display data to assist reasoning (analytic thinking) and to enable smart comparisons.
Tufte frequently referred to data visualization as a method aimed to maximize “information throughput”, yet also to be interpretable by the reader. I took information throughput to be engineering jargon for “lots of data presented.”
Maximal information throughput, from the standpoint of legal managers, has almost no relevance. The data sets that could be analyzed by AI or machine learning techniques or visualized by Excel, Tableau, R and other software are simply too small to justify that “Big Data” orientation and terminology.
That distinction understood, legal managers should take away from Tufte’s model and recommendation that when you create a graph, strive to present as much of the underlying information as you can as clearly as you can so that the reader of the graph can come to her own interpretations.
|Machine learning models need to be validated, which entrails running the model on new data to see how well the classification or prediction works. In the research explained in Part I, Part II, Part III, and Part IV, topics were identified and used to predict a European court’s decisions.
In the validation of their model, the researchers tested how accurate their model was based on being trained on a subset of the case opinions. “The models are trained and tested by applying a stratified 10-fold cross validation, which uses a held-out 10% of the data at each stage to measure predictive performance.”
In less technical words, they ran their model many, many times, each time training it on a randomly-selected 90 percent of the cases and then using the model to predict the ruling on the left-out 10 percent of the cases. They averaged the results of the multiple runs so that extremes would be less influential.
That’s not all. “The linear SVM [the machine learning algorithm employed to classify court decisions into violation found or not found] has a regularisation parameter of the error term C, which is tuned using grid-search.” We will forgo a full explanation of this dense sentence, but it has to do with finding (“tuning”) the best controls or constraints on the SVM’s application (“parameters”) through a method of testing lots of variations where the parameters are randomly varied (“grid-search”).
The article continues: “The linear kernel of the SVM model can be used to examine which topics are most important for inferring whether an article of the Convention has been violated or not by looking at their weights w.” In this research, the weights calculated by the algorithm are a measure of how much a topic influences the Court’s decision. Tables in the article present the six topics for the most positive and negative SVM weights for the parts of the opinions.
Thus ends our exegesis of a wonderful piece of applied machine learning relevant to the legal industry.
We welcome any and all questions or comments, especially those that will make even clearer the power of this method of artificial intelligence research and its application in legal management.
|All data appears because of underlying value judgments by someone. A vendor who conducts a survey of law firms or law departments privileges certain numbers that it asks for over the all the other numbers not asked about. Just the wording, number, or order of questions reveals personal biases toward what is important to know and what isn’t. (“Bias” is not a pejorative term but rather connotes the leanings or predilections or unexamined assumptions of someone.) As Frank Bruni wrote in the NY Times, Oct. 30, 2106 at SR3 regarding the proliferation of college rankings, “all of them make subjective value judgments about what’s most important in higher education.” Some look at selectiveness of colleges, others at student satisfaction, some rankings elevate diversity where others focus on earnings of graduates. The decision of what data to emphasize in any survey is far from neutral.
In the legal industry, the client-law firm relationship stands higher than all other facets of the industry as evidenced by the number are breadth of surveys. The subjective judgments of surveyors signal strongly that how a law department deals with its law firms economically is its defining attribute, rather than quality of advice or professional growth on the buyer or seller side, or independence or many other conceivable attributes. It is easier to collect data on a topic that has been promoted to the top and is suffused with money, power, and prestige.
Don’t read this as my saying that which law firms a law department pays how much for what kinds of services is unimportant. It is indeed pragmatic and very important. But I do want to highlight how easy it is to overlook that privileging certain sets of data automatically demotes other data. Legal managers need to keep in mind the subjective value judgments made everywhere in the data value chain and that different value judgments would result in different data and possible managerial decisions.
|We have explained in Part I, Part II and Part III how researchers took the text of certain European court opinions, found how often words and combinations of words appeared in them, and coalesced those words that appeared relatively often together into named, broader topics. Next, they wanted to see if software could predict from the topics how the court would decide. They relied on a machine learning algorithm called Support Vector Machines (SVM).
“An SVM is a machine learning algorithm that has shown particularly good results in text classification, especially using small data sets. We employ a linear kernel since that allows us to identify important features that are indicative of each class by looking at the weight learned for each feature. We label all the violation cases as +1, while no violation is denoted by −1. Therefore, features assigned with positive weights are more indicative of violation, while features with negative weights are more indicative of no violation.”
Whew! A linear kernel is a sophisticated method from linear algebra that projects data (transforms it into a different relationship) into a complex, multi-dimensional space (a “hyperspace, “which can be thought of as having not just an x-axis and a y-axis, but also an a-axis, b-axis and so on out to as many axes as there are data features). In that hyperspace, the SVM algorithm can accomplish more than if the data were “flatter”. For example, if finds key data points (called “support vectors”) that define the widest boundary between violation cases and non-violation cases. The result is what is known as a “hyperplane” because it separates the classes in a hyperspace as well as possible (as a line can do in two dimensions and a plane in three dimensions).
The weights that the algorithm identifies enable it to classify the topics and create the hyperplane. The weights represent the hyperplane, by giving the coordinates of a vector which is orthogonal to the hyperplane (“orthogonal” can be imagined as a perpendicular vector to some point in a hyperspace; it also means there is no correlation between orthogonal vectors). The vector’s direction gives the predicted class, so if you take the dot product [more matrix algebra] of any point with the vector, you can tell on which side it is: if the dot product is positive, it belongs to the positive class, if it is negative it belongs to the negative class. You could say that the absolute size of the (weight) coefficient relative to the other ones gives an indication of how important the feature was for the separation of the hyperplane.
Machine learning has the potential to invade and disrupt the current market for lawyers in some of the most complex, high-end legal practices. Any practice where there are large numbers of court opinions, briefs, law review articles, white papers, laws and regulations, and other textual material, it appears that IBM’s Watson looms as a tool to absorb it, recognize patterns, and augment lawyers’ reasoning. Remember, however, that Watson is a glutton for vast amounts of digitized documents. Without that diet, the formidable Watson may wither like the Wicked Witch of Oz when sprinkled with water.
Augmentation has a partnering ring, a positive valence, but the dark side lurks. Associate-heavy research memos and scores of partners “coming up to speed” in an area of law will evaporate when software does the heavy lifting and prep work. The experienced judgment of intelligent lawyers will forever be in demand, but software augmentation will limit leverage and slash hours that would have been billed to clients for a law firm tackling a new area of law for that firm.
A chilling glimpse of this future appears in the Economist, Oct. 22, 2016 at 64. The “cognitive artificial intelligence platform [Watson] has begun categorizing the various [financial industry] regulations and matching them with the appropriate enforcement mechanisms.” Experts in the regulatory web that ensnare and confound financial firms vet the conclusions Watson derives from the mass of material available to it; “A dozen rules are now being assimilated weekly.” The target is the estimated $270 billion or more spent each year on regulatory compliance – “of which $20 billion is spent simply on understanding the requirements.” Who knows how much flows to law firms or how much that flow will slow once Watson has matured?
To the extent Watson and look-alikes can make sense out of the tangle of financial regulations, lawyers will experience less demand for their tools and experience. It is unlikely that the projection of legal work to more sophisticated levels, aided by software organization and analysis, will replace the loss of billable hours at the lower end.