Frequently-used terms

Several terms crop up so frequently here that readers deserve definitions of them as well as mention of alternative phrasings or synonyms that might appear.

  • Survey. Whether online or in hard copy or by electronic voting or during an interview, any questionnaire that a law firm administers to collect information from participants. Sometimes it is referred to as a “poll” or a “straw vote.”
  • Contact. Anyone who is invited to participate in a survey. We may sometimes refer to them as “invitees,” “clients” or “prospects.”
  • Participant. Anyone who starts a survey. A participant who submits answers to a survey is a “respondent.”
  • Respondent. A participant in a survey who submits the survey.
  • Company. The organization of a person who takes a survey. Mostly a a company will be an incorporated entity, but the term also applies broadly to partnerships, not-for-profit organizations, governmental entities and any other entity.
  • Report. The electronic file or hard copy publication that contains a survey’s findings and analysis. Most typically an electronic report is in PDF format. It could, however, be in a Word file, PowerPoint deck or other formats.
  • Text. Whatever is written or listed in the survey’s report.
  • Graphic. A plot or table that displays data. We also refer to them as “graphs.” If a an element of a report does not convey data, then it would be text or a “design element.”
  • Design element. Anything in a report that is neither text nor a graphic, such as borders, images, pictures, lines, shapes, glyphs, or other elements.

Data may be vulnerable, but it helps decision-makers better than ignorance

Someone in a law firm or law department can pick apart any presentation of data.  All numbers, let alone the analyses and presentations of those numbers, are vulnerable to a range of questions such as about their completeness, ambiguity, consistency, and validity.  “Like all statistical measurements, [government data on employment] can be both honest and imprecise; a best estimate given the available tools but nonetheless subject to ambiguity, misinterpretation and error,” points out The NY Times, Nov. 4, 2016 at B4.  The data legal managers should request and absorb before they pull the trigger can also be attacked.

The old saying, “Better to light a candle than curse the darkness” reminds us that any well-intentioned data sheds more light than the total darkness of ignorance, supposition or ideology.  Try to gather numbers that can illuminate some aspect of a decision and you will be better off, even if someone who disagrees with your decision criticizes the data.  The criticisms might be correct and might have to go back and do a better job collecting, defining, parsing or visualizing the underlying data.

But, it is better to have “data’ed and lost than to never have data’ed at all” (sorry, Tennyson).

Ideology and embedded beliefs may cover up management data, but soldier on!

To a data scientist, responsible data informs and guides.  For legal managers, carefully curated data should change your mind if it shows you were unaware, mistaken or had an untenable belief.  Unfortunately, no less than silken data often succumbs to wooly thinking.

This depressing reality, that you can wear yourself out gathering insightful numbers but someone beholden to an ideology or enjoying privileges threatened by that finding will not only reject the insights but will not even acknowledge them.  The NY Times, Nov. 4, 2016 at B4 emphasizes this shortcoming of humans in an article about partisan suspicion of government employment data.  “Decades of psychological research have shown that people … tend to embrace information that confirms their existing beliefs and disregard data that contradicts them.”

When a general counsel is presented with data that shows a favored law firm’s effective billing rate is much higher than the firm’s peers or a managing partner is presented with data that a long-standing client is unprofitable, they can whisk out a sewing basket of tools to rend such “whole cloth” malarkey.  We all find comfort in data that confirms what we believe and we disregard data that controverts our values, belief sets, or sense of self.  We all believe we look good in what we put on.

Even so, data scientists in the data-tattered legal industry must persevere to support the thoughtful pursuit of enlightenment through numbers.

All data harbors choices and represent a probability

Once someone releases a number, such as a count of environmental cases in 2016 where settlements were more than $250,000, that number becomes reified. It takes on a life of its own as a given, taken for granted to be an accurate statement of a fact.  Few who later rely on that number bother to look under the hood (and quite possibly could only do so with difficulty) and understand the decisions and methods that went into its pronouncement.

All numbers have methodological issues: someone made calls at different points as to how to handle different questions. To keep with the example, what if a settlement was for $200,000 and a one-year agreement not to do something? Or what if another case settled for $500,000 payable in two installments, where the second installment was contingent on the other party doing something? Or what if a settlement was paid in a foreign-currency and someone had to decide on the appropriate exchange rate?

All the numbers that might be used by a law firm or law department in its data science efforts harbor unexamined birth pangs like these.  At some point, a data scientist has to treat her numbers as if they are accurate, but always stress test them for reasonableness, look for outliers, and probe for hidden assumptions.

This is a foundational concept of data science: trust but verify the numbers.  Moreover, in the back of our minds we should treat all numbers as probabilistic. The actual number, we hope, is the stated one, but realistically it probably vibrates in the middle of a cloud of possibly-true numbers around the Platonic ideal number.