Maps with data, but that are not choropleths

Like line plots, map plots have limited utility. They can convey both data and location where the latter has relevance. The examples shown below illustrate placement of data on maps, but they are not choropleths, which are plots that color geographic regions by a gradient to convey some range. For instance, a choropleth of the United States might color each state according to its GDP per state, say with a very light green for the lowest states on that measure and a dark green for the highest states.

Foley Lardner Telemedicine 2017 [pg. 12] has included a simple map of the United States, where it applies only two colors to the states (indicating two-party consent or one-party consent). A list could have conveyed the same information as it is not apparent that geographic location has any bearing on the consent laws. This snippet includes the lower border line of the plot (called a ruler) and part of an icon in the upper right-hand portion of the page.

Another map shows up in DLA Piper Debt 2015 [pg. 15]. This one is actually an area plot with proportional circles superimposed on select countries of Europe. The same data could have been represented as a bar chart, but the map is more interesting to the eye.

DLA Piper RE 2017 [pg. 14] provides data on its respondent’s geographic profiles by means of an exploded-out map of the United States. The two jurisdictions that are not domestic regions, “International” and “Other,” tread water in the lower left.

Unusual plots — Part I

Every now and then, amid the clonish hordes of bar and pie charts, an unusual plot breaks the boredom. Unusual plots may not be the optimal way to present data but they certainly pique interest and suggest that creative thinking about how to communicate data graphically deserves more thought. Herewith a few examples of unusual plots.

Morrison Foerster Consumer 2017 [pg. 5] added a stylized picture that invokes credit cards and identity theft to its column chart.

The tried-and-true plot styles appear repetitively because they are serviceable, familiar, and straightforward to create. Forcing the data into an unexpected plot style may not serve readers well. On the other hand, a bespoke plot style or twist on an old familiar style may be just what is needed. If law firms do not explore new ways to depict the data they collect, they miss an opportunity.

Morrison Foerster Privacy 2017 [pg. 5] hit upon an interesting visualization. The spectrum moves from low values on the left to high values on the right. Note that they use Figure 2 notation and put it at the bottom.

Norton Rose Lit 2016 resorted to the Rube Goldberg figure below to explain the meaning of one-tenth of a percent. While you can admire the ingenuity of the effort, you have to wonder whether it adds value for most readers of law-firm research survey reports. If they do not understand percentages, they will have a hard time extracting much from survey reports.

Berwin Leighton Arbappointees 2017 [pg. 7] created a gauge to display its findings. Everyone is familiar with fuel gauges, for example, so the general point is communicated to the reader, albeit not in the most pellucid format.

126 extensions to ggplot2 listed: do you know others?

For a book I just published on data graphs for legal managers, available on LeanPub, I compiled as many extension packages and functions as I could locate (on CRAN and github). Here is a link: to a pdf of the results.

If you know of any that I have missed, I would much appreciate hearing from you with a comment. Thank you.

PS If anyone would want to join me in preparing examples of the extensions or functions, somewhat like a CRAN Task View but with vignettes, that might be edifying.

Maximum “information throughput” can still guide legal managers when graphs display data

The legendary Prof. Edward Tufte gave a keynote presentation in September 2016 at Microsoft’s Machine Learning and Data Summit.  Tufte’s ambitious subject was “The Future of Data Analysis”.   You can listen to the 50-minute talk online.  Early on he emphasized that you display data to assist reasoning (analytic thinking) and to enable smart comparisons.

Tufte frequently referred to data visualization as a method aimed to maximize “information throughput”, yet also to be interpretable by the reader.  I took information throughput to be engineering jargon for “lots of data presented.”

Maximal information throughput, from the standpoint of legal managers, has almost no relevance.  The data sets that could be analyzed by AI or machine learning techniques or visualized by Excel, Tableau, R and other software are simply too small to justify that “Big Data” orientation and terminology.

That distinction understood, legal managers should take away from Tufte’s model and recommendation that when you create a graph, strive to present as much of the underlying information as you can as clearly as you can so that the reader of the graph can come to her own interpretations.

Create a choropleth to display data by State, country, region

When legal managers want to present data by State or by country, they can make good use of what is called a “choropleth”.  Choropleths are maps that color their regions in proportion to the count or other statistic of the variable being displayed on the map, such as the number of pending law suits per State or amounts spent on outside counsel by country.   Darker colors typically indicate more in a region and lighter shades of the color indicate fewer.

Below is an example of a choropleth that appears in Exterro’s 2016 Law Firm Benchmarking Report at page 8.  It shows how many of the 112 survey participants come from each state.


California is the darkest with 21; the grey states had no participants.  The table below the map, which is truncated in this screen shot, gives the actual numbers by State, so someone could carp that the choropleth sweetens the eye but adds no nutritional information.  Still, it looks pretty good and it is an unusual example of an effective graphical tool.

Law firms, their number of lawyers and mentions in court documents

Data analytics and visualization can sometimes tell you about where your law firms stands relative to competitors and can therefore guide your positioning and selling efforts.   An illustration of this benefit comes from Corp. Counsel, Oct. 2016 at 44, where a table shows counts of U.S. law firms that “turn up the most in court documents.”

The table lists the firms by number of “mentions”; most often mentioned were Littler Mendelson and Ogletree Deakins at 100, least often mentioned were Fisher & Phillips, Gordon Rees, and Hunton & Williams at 24.

A more revelatory analysis matches the number of mentions of a firm to its number of lawyers.   After all, bigger firms are more likely to represent litigants than smaller firms, everything else held constant.


The plot above shows data on law firm lawyers from a year or so ago, but the relative size differences among this group of 30 likely still hold.  It shows each firm’s lawyer count from the left axis and its mentions from the bottom axis, with the firm name next to its point on the scatter plot.

One possible conclusion from this plot is that firms specializing in employment litigation rack up the most mentions.

A makeover of an ILTA graph, explaining the improvements in visualization

Legal managers who create data-analysis graphs should strive to make those graphs effective communicators.  Let’s pause for a teaching moment.  I wrote a post about the 2016 ILTA/InsideLegal Technology Purchasing Survey and its question about areas of practice where respondents foresaw AI software penetrating.   

The plot in the upper right portion of page 13 that summarizes the answers to that question could be improved in several ways.


The bar colors are nothing but distracting eye-candy, since the colors do not convey any additional information.  If a couple of bars were colored to indicate something, that would be a different matter.

Second, it was good to add the percentages at the end of the bars, rather than force readers to look down at the horizontal axis and estimate them; however, if the graph states each bar’s percentage, the horizontal axis figures are unnecessary.  Even more, the vertical grey lines can be banished.

Third, most people care less about an alphabetical ordering of the bars than they do about comparisons among the applications on percentages.  It would have been more informative to order the bars in the conventional longest-at-the-top to shortest-at-the-bottom style.

As a kudo, it was good to put the application areas on the left rather than the bottom.  Almost always there is more room on the left than in the narrower bands at the bottom.


A makeover using the same data cures these problems and displays a few other visualization improvements.  The new plot removes the boundary lines around the plot, which gives a cleaner look.  It also enlarges the font on the percentages relative to the font on the applications, since those figures are likely to be the ones that readers care most about and want most emphasized.  Two final tweaks: the application names are on one line, and the axes have no “tick marks”, the tiny lines that mark the mid-point of an axis interval but that rarely add any value.