Maximum “information throughput” can still guide legal managers when graphs display data

The legendary Prof. Edward Tufte gave a keynote presentation in September 2016 at Microsoft’s Machine Learning and Data Summit.  Tufte’s ambitious subject was “The Future of Data Analysis”.   You can listen to the 50-minute talk online.  Early on he emphasized that you display data to assist reasoning (analytic thinking) and to enable smart comparisons.

Tufte frequently referred to data visualization as a method aimed to maximize “information throughput”, yet also to be interpretable by the reader.  I took information throughput to be engineering jargon for “lots of data presented.”

Maximal information throughput, from the standpoint of legal managers, has almost no relevance.  The data sets that could be analyzed by AI or machine learning techniques or visualized by Excel, Tableau, R and other software are simply too small to justify that “Big Data” orientation and terminology.

That distinction understood, legal managers should take away from Tufte’s model and recommendation that when you create a graph, strive to present as much of the underlying information as you can as clearly as you can so that the reader of the graph can come to her own interpretations.

Law firms and departments are not dealing with “Big Data”

“Big Data” has no accepted formal definition, as its parameters are difficult to pinpoint, according to Victoria Lemieux et al., “Meeting Big Data challenges with visual analytics,” Records Mgt. J. 24(2), July 2014 at 122 [citations omitted].   Extremely large amounts of data is a prerequisite, but “At what volume data become big remains an open question, however, with some suggesting that it comprises data at the scale of exabytes” or larger. An exabyte is a billion gigabytes.   Others look at volume in terms of manageability by standard software: “data ‘with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time’”.  The legal industry, aside perhaps from a rare and ginormous e-discovery mountain, faces modest volumes of data.

Other definitions of Big Data emphasize not just the sheer volume of data, but also its velocity (speed of data in and out), and variety (range of data types and sources). Some writers also include veracity (the biases, noise and abnormality in data) as an additional defining characteristic.”   The legal industry’s data arrives at a relative snail’s pace, in traditional garb, and what pertains to management issues consists of text or numbers from relatively few sources.  Veracity is an add-on to the common parameters of volume, velocity, and variety.

The data challenges for the legal industry are still formidable, even if the loosely-defined and ubiquitous term “Big Data” does not apply.