Order of selections in multiple choice: frequency of being chosen

When you create a multiple-choice question, the order of the selections to pick from can make a difference in the results you obtain. For example, if the first selection is the most likely choice, you may suggest to some respondents that the order of the selections reflects a pattern of declining priority. Which ones they pick (and depending on the restrictions, how many they pick) will be influenced by that perceived priority suggestion.

Many surveyors alphabetize the selections as a method to counteract any such bias in their order. Another technique, which we will discuss later, randomizes the order of the selections as presented to each person. High-end survey sites can make the ordering vary for each respondent according to a randomizer. An intermediate solution would be for the surveyor to create a few different versions of the survey that each vary the orderings of the selections.

To test one aspect of whether order of selections influences respondents — the number of times a selection is picked, I analyzed a recent survey. The data consists of four multiple-choice questions that allowed respondents to check all of the selections that applied to them. Think of a whimsical question like “Which desserts do you like best? (check all that apply)” where there are seven different scrumptious delights and some or all of them could be checked.

The plot below shows totals for how many times respondents chose the first selection of the question (the bar with 1 at its base on the x axis and the height of 483), the second selection (bar 2 at height 465), and so on.

It presents the result from only their first seven selections of each question, since that was the minimum number of selections across the four questions 1. As far as I could tell, there was no evident logic to the arrangement of the selections. Each one seems plausible and not ranked in any apparent way.

A different form of this empirical inquiry would take multiple-choice questions where only one selection was permitted. Obviously, also, the selections cannot be some fact that is fixed, such as the position or age of the respondent; the selections need to invite independent, subjective judgments, as with dessert preferences.

Returning to the plot, with the exception of the fifth-position selection, the number of selections drops off steadily as the position of the selection increases. That is to say, for the most part, respondents checked selections fewer and fewer times as they moved down the seven selections. It’s as if respondents grew fatigued and didn’t pay as much attention to the later selections.


  1. This decision could throw off the results in that a couple of the questions had 9 or 10 selections.

Cure ambiguities in selections for multiple-choice questions

When creating the choices for a multiple-choice question, a careful developer will take time to be sure that the choices are as unambiguous as possible. Helping respondents know what each choice means may entail writing a definition of the term. Note that your survey software needs to have this capability or you may have to do it in the body text of the question. Additionally, a conscientious developer will ask several people to vet the choices for lack of clarity before releasing the survey.

The survey conducted by Berwin Leighton Paisner in 2014 1 offers an instructive example of the importance of defining terms. Below you can see the graphical results of the question.

However, the report does not include the actual form of the question asked on the survey so we do not know if any of the choices were defined. If we assume that the question asked something like “What is your role?”, we might further assume that the position choices were simply those shown as the five labels along the x axis at the bottom of the plot. Are each of them clear?

If a respondent were the general counsel for North America of a global company that has a global chief legal officer, which selection is appropriate? If a lawyer admitted to practice is working in the risk or compliance group, should she select that group or “In-house lawyer?” This example admittedly uses titles that are quite commonly included in research surveys, but still the important lesson teaches us that with multiple-choice questions to try to wring out blurriness and varied interpretations of key terms.

A second observation about this particular finding highlights the relatively large number of “Other.” If BLP’s survey included a text box for the person to provide a title not covered by the four given, it would have been better to review those additional titles and create another position or two to account for some or all of them specifically. Without further insight into the positions of respondents who selected “Other,” the category is quite large relative to the remaining four and created an analytic hole if the law firm wanted to analyze responses by position.


  1. Previous posts explain the set of research surveys by 15 law firms from which BLP’s is one.

Multiple-choice questions need “Other” and MECE

On research surveys 1, many questions give respondents several choices for answers. Cleverly called “multiple-choice questions”, they probably are the most common format for questions. Given their popularity, we will devote several posts to exploring how best to create them.

Let’s pick the Hogan Lovells report released in 2014 on cross-border disputes to begin the broader discussion. On page 16 readers see the graphic shown below. We suspect the graphic presents data from a multiple-choice question in part because another Hogan survey states: “… the majority of answers [were] channelled through multiple-choice formats.”

The cross-border report does not tell the reader the question the firm asked on the survey instrument, but it may have been something like “Which of the following are [or have been] concerns of your Board in relation to cross-border disputes? (Check all that apply)”. The survey question listed the eight concerns shown in the graph, each with a checkbox or some way to select that concern. Note that the report does not explain whether the question asked respondents to check, for example, their Board’s two greatest concerns or three greatest concerns. It is most common for research surveys to invite respondents to choose as many of the selections as they want.

We commend the firm for selecting a group of very plausible Board-level concerns. However, the choices raise two observations. First, it is a good practice with a multiple-choice question to include as the final choice, “Other.” Along with that “Other” choice the question should have a text box to let respondents write in what they think was not adequately covered by the given choices. One benefit of free-form text answers is that if you do the survey a second time, your choices may be more comprehensive or more appropriately worded. A downside of text answers, however, is that they require a thoughtful human to code them, a step that injects subjectivity.

The second observation brings to mind the acronym MECE (Mutually Exclusive and Comprehensively Exhaustive). A MECE question tries to give participants choices that cover all of the possibilities (Comprehensively Exhaustive) at the same time the choices do not overlap (Mutually Exclusive). Covering all possibile choices is quite difficult (in part because you do not want a very, very long list of choices) but one advantage of having an “Other” choice is that if participants rarely choose it, you have probably done a good job of covering the waterfront. On this particular example it feels like “ability to predict outcome” and “uncertainty” overlap in some way having to do with the degree of unknown. Or, the third most common choice, “exposure/liability” blurs with the first two and the fourth choices. All four of them have to do with loss.


  1. This post builds on my introduction to law-firm research surveys published on Nov. 29, 2017.

Balancing survey respondents across industries and geographies

When most law firms conduct a research survey, they primarily hope to get enough respondents so that the results are defensible and generalizable. That is to say, they want enough data from desirable respondents to be able to say that their findings make sense and can be extrapolated beyond their particular group of participants.

Ropes & Gray made a very different decision in its 2017 survey on risk management practices in companies. Working with a research group, the firm deliberately balanced the number of participants by five named industries plus “Other” and across four geographical regions. As can be seen in the table below, adapted from page 5 of the report, out of the 300 total respondents, 100 came from each of America and EMEA, while 70 came from Pacific Asia and 30 from Latin America. Moreover, each industry had exactly 50 participants.

 AmericaEMEAPacific AsiaLatin AmericaTotal
Private Equity171612550
Asset Management161811550
Life Sciences & Healthcare171711550

The very brief description of the survey’s methodology does not explain why the firm chose those industries, those geographies, or the balanced participant numbers within them. Nor does it delve into how FT Remark, the research firm that assisted Ropes \& Gray, obtained the desired number of respondents.

One reason for the geographic distribution may have been that it proved difficult to obtain equal numbers of respondents for each pair of industry and geography. It may also be that the firm feels that this geographical weighting in some way more accurately represents companies and their risk management approaches around the globe. Many other questions arise regarding the decisions underlying this symmetric data set.

We will close by noting that if a law firm sets its goal to proportionally balance the number of responses by one or more criteria, revenue of the company being another possible parameter, it significantly increases the effort to locate and persuade the requisite number of respondents.


Surveys by law firms for research purposes — an introduction to a series of posts

My goal for the next several months is to write about research surveys sponsored by law firms. By the term ‘research surveys,’ I mean questionnaire initiatives that collect, analyze and publish data and opinions that inform the organizations who respond as well as their peers. Research surveys, in other words, are not client-satisfaction surveys by law firms nor are they internal surveys by a firm’s management.

To date I have analyzed 19 research surveys by 14 different law firms: Allen & Overy, Berwin Leightner Paisner, Carlton Fields, Davies Ward Phillips, Goulston & Storrs, Haynes and Boone, Hogan Lovells, Littler Mendelson, Norton Rose Fulbright, Proskauer Rose, Ropes & Gray, Seyfarth Shaw, White & Case, and Winston & Strawn.

Over this series of posts my plan is to discuss many aspects of such surveys, including their motivations, their methodologies, their graphics, and their marketing.

I enthusiastically welcome comments on these posts. Further, if you know of any law-firm research surveys that I have overlooked, I would very much appreciate hearing from you about them.

126 extensions to ggplot2 listed: do you know others?

For a book I just published on data graphs for legal managers, available on LeanPub, I compiled as many extension packages and functions as I could locate (on CRAN and github). Here is a link: https://www.dropbox.com/s/ku2gyvlvvq1cyfa/ggExtensions.pdf?dl=0 to a pdf of the results.

If you know of any that I have missed, I would much appreciate hearing from you with a comment. Thank you.

PS If anyone would want to join me in preparing examples of the extensions or functions, somewhat like a CRAN Task View but with vignettes, that might be edifying.

My New Book on Graphical Analysis of Data for Decisions

As readers may know, my interests have broadened from consulting to law departments to applying analytic tools to legal data.  I program with open-source R and am deep into machine learning algorithms as applied to law firm and law department operational data.

My latest book shows how law firms can use their data, when presented effectively in plots, to make better operational decisions.  Law departments, too, can learn from the techniques laid out in the book and they can encourage their firms to improve.

Some 75 types of charts are included, each one showing how different sets of law firm data (65 in all) might be presented.  The book also explains a wide variety of graphing choices and techniques.

The book is available for download in PDF format on LeanPub.

If you have any comments about the book or the topic, I would very much like to hear from you.  Likewise, if you would spread the word about the book in a newsletter, email, blog post, group, webinar, or tweet that would be much appreciated.

Descriptive statistics and the step beyond to predictive statistics

A fundamental distinction between two kinds of data analytics appears in a report published by KPMG, “Through the looking glass, How corporate leaders view the General Counsel of today and tomorrow” (Sept. 2016).  “Companies are making greater use of data analytics and are increasingly moving from descriptive analytics (where technology is used to compress large tranches of data into more user-friendly statistics) to predictive analytics and prescriptive models that extrapolate future trends and behavior.” (page 14)

Law firms and law departments can avail themselves of many kinds of software to summarize aspects of a data set.  Descriptive statistics, as some call it, include calculating averages, medians, quantiles, and standard deviations.  These summary statistics, yet another term for the basic calculations, are themselves simplified models of the underlying data.  [Note that a “statistic” is properly a number calculated from underlying data.  So, we calculate the variance statistic of all this year’s invoices where the underlying “raw” data is the data set of all the year’s invoices.]

Predictive statistics go farther than descriptive statistics.  Using programs like open-source R and its lm package, you can easily fit a regression model that predicts the number of billable hours likely to be recorded by associates based on their practice group, years with the law firm, gender and previous year’s billings, for example.   Predictive analytic models allow the user to derive numbers, not just describe them

The term “predictive analytics” compared to “machine learning”

The term “machine learning” may be most common, but the alternative “predictive analytics” has much going for it.  This is the term Eric Siegel has promoted extensively, including in his Predictive Analytics (John Wiley 2016).

Siegel places “machine learning” mostly in academia and research papers.  It is a computer science term that connotes statistics and matrix algebra.  His term has more overtones of usefulness to business as it stresses the value of algorithms that take in data and predict numbers or classifications or most-similar values.

Data may be neutral, but interpretation of it is never

Legal managers need to appreciate the gap between numbers and interpretation of those numbers.   Stated differently, contrary to the chestnut “the numbers speak for themselves”, a babble of conclusions can be reached from any set of numeric data.

A charming anecdote from the NY Times, Nov. 4, 2016 at B4, captures the multiple voices of numbers due to the subjectivity of inference.  According to the Times, the Bureau of Labor Statistics has an unofficial motto for when they are asked about their employment data.  They don’t indulge in drawing conclusions as to whether the employment glass is half full or half empty:  they respond, “It’s an eight-ounce glass with four ounces of liquid.”

In a different legal context and drawing on the wisdom of TV, “Just the facts, ma’am” – leave the interpretation to us.

Data scientists can presumably measure the glass and the amount of liquid in it, but managers in law firms and departments must come to their own conclusions about fullness.