For a book I just published on data graphs for legal managers, available on LeanPub, I compiled as many extension packages and functions as I could locate (on CRAN and github). Here is a link: https://www.dropbox.com/s/ku2gyvlvvq1cyfa/ggExtensions.pdf?dl=0 to a pdf of the results.
If you know of any that I have missed, I would much appreciate hearing from you with a comment. Thank you.
PS If anyone would want to join me in preparing examples of the extensions or functions, somewhat like a CRAN Task View but with vignettes, that might be edifying.
|The legendary Prof. Edward Tufte gave a keynote presentation in September 2016 at Microsoft’s Machine Learning and Data Summit. Tufte’s ambitious subject was “The Future of Data Analysis”. You can listen to the 50-minute talk online. Early on he emphasized that you display data to assist reasoning (analytic thinking) and to enable smart comparisons.
Tufte frequently referred to data visualization as a method aimed to maximize “information throughput”, yet also to be interpretable by the reader. I took information throughput to be engineering jargon for “lots of data presented.”
Maximal information throughput, from the standpoint of legal managers, has almost no relevance. The data sets that could be analyzed by AI or machine learning techniques or visualized by Excel, Tableau, R and other software are simply too small to justify that “Big Data” orientation and terminology.
That distinction understood, legal managers should take away from Tufte’s model and recommendation that when you create a graph, strive to present as much of the underlying information as you can as clearly as you can so that the reader of the graph can come to her own interpretations.
When legal managers want to present data by State or by country, they can make good use of what is called a “choropleth”. Choropleths are maps that color their regions in proportion to the count or other statistic of the variable being displayed on the map, such as the number of pending law suits per State or amounts spent on outside counsel by country. Darker colors typically indicate more in a region and lighter shades of the color indicate fewer.
Below is an example of a choropleth that appears in Exterro’s 2016 Law Firm Benchmarking Report at page 8. It shows how many of the 112 survey participants come from each state.
California is the darkest with 21; the grey states had no participants. The table below the map, which is truncated in this screen shot, gives the actual numbers by State, so someone could carp that the choropleth sweetens the eye but adds no nutritional information. Still, it looks pretty good and it is an unusual example of an effective graphical tool.
Legal managers who create data-analysis graphs should strive to make those graphs effective communicators. Let’s pause for a teaching moment. I wrote a post about the 2016 ILTA/InsideLegal Technology Purchasing Survey and its question about areas of practice where respondents foresaw AI software penetrating.
The plot in the upper right portion of page 13 that summarizes the answers to that question could be improved in several ways.
The bar colors are nothing but distracting eye-candy, since the colors do not convey any additional information. If a couple of bars were colored to indicate something, that would be a different matter.
Second, it was good to add the percentages at the end of the bars, rather than force readers to look down at the horizontal axis and estimate them; however, if the graph states each bar’s percentage, the horizontal axis figures are unnecessary. Even more, the vertical grey lines can be banished.
Third, most people care less about an alphabetical ordering of the bars than they do about comparisons among the applications on percentages. It would have been more informative to order the bars in the conventional longest-at-the-top to shortest-at-the-bottom style.
As a kudo, it was good to put the application areas on the left rather than the bottom. Almost always there is more room on the left than in the narrower bands at the bottom.
A makeover using the same data cures these problems and displays a few other visualization improvements. The new plot removes the boundary lines around the plot, which gives a cleaner look. It also enlarges the font on the percentages relative to the font on the applications, since those figures are likely to be the ones that readers care most about and want most emphasized. Two final tweaks: the application names are on one line, and the axes have no “tick marks”, the tiny lines that mark the mid-point of an axis interval but that rarely add any value.