Legal managers should be intrigued by the fledgling capabilities of software to predict decisions of courts. To assess the likelihood and power of such a capability in the real world, however, calls for a manager to understand the tools that data scientists might deploy in the prediction process. Fortunately, the ABA Journal cited and discussed a study published in October 2016 that offers us a clear example.
Four researchers, including three computer scientists and a lawyer, used machine learning software to examine 584 cases before the European Court of Human Rights. They found that the court’s judgments of the plaintiff’s rights having been violated or not were more highly correlated to facts than to legal arguments. Given only the facts, their model predicted the court’s decisions with an average accuracy of 79 percent.
The article’s software-laden explanation of their method and tools makes for very heavy reading, so what follows attempts humbly and respectfully and certainly not infallibly to translate their work into plainer English. The starting point is their abstract’s summary. They “formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e., N-grams, and topics.” Additionally, “The text, N-grams and topics, trained Support Vector Machine (SVM) classifiers… [which] applied a linear kernel function that facilitates the interpretation of models in a straightforward manner.”
First, let’s get a handle on how the prediction model dealt with the text of the opinions (Natural Language Processing), then we will look at the software that classified the text to predict a violation or not (Support Vector Machine).