Advisable to use “Don’t know” or “NA” in multiple-choice questions

Well-crafted multiple-choice questions give respondents a way to say that they don’t know the answer or that no selection applies to their situation. The two non-answers differ in that ignorance of the answer — or, possibly refusal to give a known answer — can be remedied by the respondent whereas they can’t supplement an incomplete set of selections. Firms should not want people they have invited to take a survey to have to pick the least bad answer when their preferred answer is missing. As we have written before firms should add an “Other” choice with a text box for elaboration.

From HoganLovells Cross-Border 2014 [pg. 19] comes an example of how a multiple-choice question accommodates respondents who don’t know the answer. Also, it shows how data from such a question might be reported in a polar graphic. Seven percent of the respondents did not know whether their company’s international contracts include arbitration procedures.

In the jargon of data analysts, a “Don’t know” is called item non-response: no answer is given to a particular survey item when at least one valid answer was given to some item by the same respondent, e.g., leaving an item on a questionnaire blank, or responding to some questions by saying, “I don’t know,” while providing a valid response to other questions.

Another survey, DLA Piper Compliance 2017 [pg. 15], used a “Does not apply” option. Almost one-third of the respondents checked it. It is conceivable that some respondents did not know the answer and resorted to denying its applicability to them as the best of the three choices, although far from optimal.

One more example, this time from Fulbright Jaworski Lit 2009 [pg. 61]. Here, one-fifth of those who took the survey indicated that they didn’t know the answer to the question reproduced on top of the plot.

It is easy to include variations of the non-substantive selections described above. In fact, extrapolating from these three instances, firms probably should do so since significant numbers of respondents might pick them — on average almost one out of five in the above surveys.

Multiple-choice questions dominate the formats of questions asked

Having examined more than 100 reports published by law firms based on the surveys they sponsored, I suspected that more than three out of four questions asked on the surveys fell into the category of multiple choice. Reluctant to confirm that sense by laboriously trying to categorize all the questions in all those surveys, I invited my trusty R software select five of the surveys at random.

Sure enough, all five not only exceeded the perception of at least 75% of the questions being multiple choice, but in fact every single question that could be identified from the five reports fell into that format! Bear in mind that we can’t be certain about all the questions asked on the surveys, but we can glean from the reports most of them. It would be necessary to count from the actual questionnaire to confirm this data.

Specifically, Seyfarth Shaw Future 2017 went eight for eight, Morrison Foerster MA 2014 was five out of five, and Berwin Leighton Arbvenue 2014 used multiple-choice questions for all of its at least 14 questions (it is difficult to figure out from the Berwin report exactly how many questions were on the survey). In Foley Lardner Telemedicine 2014, all twelve questions (include three demographic questions) were multiple choice; with Foley Lardner Cars 2017, all 16 questions were multiple choice (including two demographic questions).

Of those 55 multiple choice questions, a few presented binary choices but most of them presented a list of 4-to-7 selections to pick from. Likert scales appeared rarely, as illustrated in the plot below from Foley Lardner Cars 2017 [pg. 5]. The scale ranges from “Strongly Agree” to “Strongly Disagree.”

Morrison Foerster MA 2014 [pg. 4] also used a Likert scale in a question.

Multiple-choice questions that ask for a ranking can yield deeper insights

If you want to capture more information than you can from simple multiple choice questions, then a ranking question might be best for you. For one of its questions, Berwin Leighton Risks (2014) [pg. 17] presented respondents with seven legal risks. The instructions told the respondents to rank the risks from 1 to 8 (where 1 was the most serious risk and 8 the least serious). [Right, 8 ranking choices for only 7 items!] Presumably no ties were allowed (which the survey software might have enforced.)

The report’s plot extracted the distribution of rankings only for the two most serious, 1 or 2. It appears that the plot tells us, for example, that 48 respondents ranked “Legislation/regulation” as a 1 or 2 legal risk (most serious). Other plots displayed the distribution of 3 and 4 rankings and less serious rankings.

A ranking question, especially one with as many as seven elements to be compared to each other, burdens participants, because to answer it conscientiously they need to consider each element relative to all the others. As a surveyor, you can never completely rely on this degree of respondent carefulness.

But ranking questions can yield fruitful analytics. Rankings are far more insightful than “pick the most serious [or whatever criterion],” which tosses away nearly all comparative measures. Rankings are more precise than “pick all that are serious,” which surrenders most insights into relative seriousness. Yet the infrequency of ranking questions in the law-firm research survey world is striking. Findings would be much more robust if there were more ranking questions.

Some people believe that rankings are difficult analyze and interpret. The visualization technique of Berwin Leighton that presents different views of the aggregate rankings belies that belief. Many other techniques exist to analyze and picture ranking responses.

A ranking question gives a sense of whether a respondent likes one answer choice more than another, but it doesn’t tell how much more. A question that asks respondents to allocate 100 percent among their choices not only ranks the choices but differentiates between them much more precisely than simple ranking. Proportional distribution questions, however, appear in law firm surveys even less than ranking questions. In fact, we could not find one among the hundreds of plots we have examined. Perhaps the reason is that these questions are even more complicated to explain to survey participants.

Challenges choosing categories, e.g., for revenue demographics

When a law firm invites its contacts to to take a survey, those people who accept probably form an irregular group in terms of the distribution of their corporate revenue.  Their revenue can range by the happenstance of self-selection and how the list was compiled from negligible revenue to many billions of dollars. When the firm’s report describes the revenue characteristics of the group, the firm must decide what ranges of revenue to use.

The firm might slice the revenue categories to put in them roughly equal numbers of participants. Doing this usually means that the largest category spans a wide range of revenue — “$3 billion and up” — whereas the smallest category tends to be narrow — “$0 to 100 million.” Such an imbalance of ranges results from the real-world distribution of companies by revenue: lots and lots of smaller companies and a scattering of huge ones (the distribution is long-tailed to the right). Stated differently, the corporate revenue pyramid displays a very, very broad base.

Alternatively, a law firm might choose to set the revenue ranges by some specific range values, perhaps “\$0-1 billion, \$1-2 billion, \$2-3 billion” and so on. The categories may make sense a priori, but binning revenue this way can result in very uneven numbers of participants in one or more of the categories depending on what categories are chosen, how narrow they are, and the vagaries of who responds.

Davies Ward Barometer (2010) [pg. 10] explained the corporate revenue ranges of its respondents in words. These are unusual ranges. The distribution skews toward remarkably small companies. Note from the last bullet that almost one out of three survey respondents “are not sure of their organization’s annual revenue.” Perhaps they do not want to disclose that revenue, as they work for a privately-held company. Or perhaps the organization has no “revenue,” but has a budget allocation as a government agency.

With a third approach, a firm fits its revenue categories to its available data set so that plots look attractive. You can guess when a firm selects its revenue categories to fit its data set. Consider the plot below from DLA Piper’s compliance survey (2017) [pg. 26]. The largest companies in the first category reported less than $10 million in revenue; the next category included firms with up to 10 times more revenue, but about the same percentage of respondents; the third revenue category again spanned companies with up to ten times more revenue, topping out at $1 billion, but close to the preceding percentages. Then we see a small category with a narrow range of $400 million followed by the two on the right with half the percentages of the left three. It appears that someone tried various revenue categories to find a combination that looks sort of good in a graphic.

The fullest way to describe the revenue attributes for participants turns to a scatter plot. From such a plot, which shows every data point, readers can draw their own conclusions about the distribution of revenue.

Multiple-choice questions put a premium on simplicity and clarity

The best surveys present questions that participants understand immediately. Short, clear and with familiar words — that’s the secret to reliable answers and to participants continuing on with the survey. Much more than fill-in-the-blank questions or give-your-answer questions, multiple-choice questions especially need simple and direct because participants have to absorb the question first and then slog through some number of selections.

Some questions demand quite a bit from the participant. What is the complexity level of the question shown at the top of the image below, taken from Pinsent Masons TMT 2016 [pg. 20]? The person tackling that question had to juggle the broadness of “considerations,” bring to mind comprehensive knowledge of the company’s dispute resolution policy (recalling the meaning of “DR”), and apply both sensibilities in the context of arbitrations. Even though this question handles a complex topic quite succinctly, the cognitive load on the participant piled up.

For question designers, a cardinal sin includes “ands” or “ors.” When a conjunction joins two ideas, a “double barrel question” in the evocative term from the textbook, Empirical Methods in Law, by Lawless, Robert M., Robbennolt, Jennifer K. and Ulen, Thomas S., Wolters Kluwer, 2nd Ed. 2016 at 67,  it asks whether X and Y both are true. What if X is true but not Y, or Y but not X? How does a respondent answer half of a conjunction?

Feel the cognitive schism of a conjunction from the question asked in Gowling WLG Protectionism (2017) [pg. 13]. Some participants might believe that their sector is aware of the risks of protectionist policies but hasn’t prepared how to respond to them (i.e., the sector is on notice but not ready to act). What is the right answer for those participants?

Alternatively (or disjunctively), to the question whether X or Y is true, when the analysis step arrives, a firm can’t disentangle X from Y is since they have been annealed. X could be true and Y could be false, or the reverse.

We will close with one more example of both complexity and conjunction. [pg. 14] confronted respondents with seven selections, several of which were complex and one of which included a conjunction [the fourth from the top, “Breakdown … and the rise …”]. As with the Gowling question, this selection might leave a participant in a bind if one part of the selection holds true but not both parts.

Order of selections in multiple-choice questions

Since participants are expected to read all the selections of a multiple-choice question, the order in which you list them may seem of little moment. But the consequences of order can be momentous. Respondents might interpret the order as suggesting a priority or “correctness.” For example, if the choice that the firm thinks will chosen most commonly stands first, that decision will influence the data in a self-fulfilling pattern. The firm thinks it’s important — or, worse, would prefer to see more of that selection picked — and therefore puts it first, while respondents are influenced by supposing that privileging to be true and choose it.

Or participants may simple tire of evaluating a long list of selections and deciding which one or more to choose. They may unknowingly favor earlier choices so that they can declare victory and move on to the next question.

Let’s look at a question from the King & Spalding survey on claims professionals (2016) [pg. 15], not in any way to criticize the question but to illustrate the possibility of the skews described above.

We don’t know enough about claims professionals or lines of insurance to detect whether this selection order nudges respondents, but clearly the selections are not in alphabetical order. When selections appear in alphabetical order, the assumption is that the firm tried to randomize the order and thereby avoid guiding respondents.

Another option for a firm is to prepare multiple versions of the survey. Each version changes the order of selections of the key multiple-choice question or questions. The firm sends those variants randomly to the people invited to take the survey. So long as the text of the selections remains the same, the software that compiles results will not care about variations in selection order.

A more sophisticated technique to eliminate the risk of framing relies on the survey software to present the selections in random order for each survey taker. In other words, the order in which person A sees the selections is randomly different than the order in which person B sees the selections.

Published reports infrequently restate the exact question asked and never the arrangement of selections. All the reader has to go by is the data as reported in the text, table or graphic. Because the summary of the data usually starts with the most common selection and discusses the remaining results in declining order, the original arrangement of selections is not available.

For example, here is one multiple-choice question from Davies Ward Barometer (2010) [pg. 58]. At the top, the snippet provides the text of the report which gives a clue to the question asked of respondents. Nothing gives a clue about the order of the selections on the survey itself.

As an aside, consider that this survey followed several prior surveys on the same topic. It is possible that the order of the selections reflects prior responses to a similar question. That would be a natural thing to do, but it would be a mistake for the reasons described above.

Techniques to reduce mistakes by respondents

What can a firm do to improve the likelihood that respondents answer multiple-choice questions correctly? The substance of their answer is known only to them, but some methodological trip-ups have solutions. To address the question, we can revisit the failure points that we presented above.

Reverse the scale. One step to identify a misreading asks a second question to confirm the first answer. So, if the first question asks for a “1” to indicate “wholly ineffective” on up to a “10” to indicate “highly effective,” a later question might present the choices and ask the respondent to pick the most effective one. If that choice did not get a high number (8, 9 or 10, probably) on the first question, you have spotted a potential scale reversal. If you decide to correct it, you can manually revise the ratings on the first question. Second, using different terms for the poles might improve accuracy, although at a cost of some consistency and clarity. Thus, the scale might be a “1” to indicate “wholly ineffective” on up to “10” to indicate “highly productive.” Respondents are more likely to notice the word or phrase variability and get the scale right.

Misread the question. Sometimes, next to the answer choices you can repeat the key word. Seeing the key word, such as “most inexpensive”, a respondent will catch his or her own misreading. As with scale reversals, here too a second question might confirm or call out an error. Alternatively, a firm might include a text box and ask the respondent to “briefly explain your reasoning.” That text might serve as a proof of proper reading of the question.

Misread selections. In addition to the remedies already discussed, another step available to a firm is to write the selections briefly, clearly, and with positives. “Negotiate fixed fees”, therefore, improves on “Don’t enter into billing arrangements based on standard hourly rates.” Furthermore, don’t repeat phrases, which can make selections look similar to a participant who is moving fast. “Negotiate fixed fees” might cause a stumble if it is followed by “Negotiate fixed service.”

Misread instructions. The best solution relies on survey software that rejects everything except numbers. That function should screen out the undesirable additions. The downside is that participants can grow frustrated at error messages if they do not tell them clearly the cause of their mistake: “Please enter numbers only, not anything else, such as letters or symbols like $.”

Fill in nonsense when answers are required. As mentioned, sophisticated software might detect anomalous selections, but that leads to dicey decisions about what to do. An easier solution is to keep the survey focused, restrict selections to likely choices (and thus fewer of them), and make them interesting. Sometimes surveys can put in a question or step that reminds participants to pay attention.

Give contradictory answers. Again, in hopes of trapping contradictions law firms can structure the question set to include confirmatory questions on key points. The drawback? A longer survey. Alternatively, some firms might email respondents and confirm that they meant to give answers that conflict with each other. Likewise, interviews after the survey comes back may smoke out corrections.

Become lazy. Keep the survey short, well-crafted, and as interesting as possible for the participant. Perhaps two-thirds of the way through a firm could ‘bury’ an incentive button: “Click here to get a $15 gift certificate.” Or a progress bar displayed by the survey software can boost flagging attention (“I’m close, let’s do a good job to the end….” .

Too quickly resort to “Other”. Despite the aspiration to achieve MECE (mutually exclusive, comprehensively exhaustive), keep selections short, few, and clear. Pretesting the question might suggest another selection or two. Additionally, a text box might reduce the adverse effects of promiscuous reliance on “Other”.

Ten pitfalls of respondents on multiple-choice questions

Before plunging into the bog of blunders, let’s define respondent as someone who presses submit at the end of an online questionnaire. An alternative term would be participant. Potential respondents who stop before the end of the questionnaire are partial participants. Typically, survey software logs the responses of partial participants. Now, enter the bog, if ye dare!

We have listed below several things that can go wrong when people tackle multiple choice questions. The pictorial summarizes the points.

  1. Reverse the scale. With a question that asks for a numeric value, as in a table of actions to be evaluated on their effectiveness, a “1” checked might indicate “wholly ineffective” while a ten might indicate “highly effective.” Some people may confuse the scale of low to high and check a “1” when they mean “highly effective”.
  2. Misread the question. Hardly unique to multiple-choice questions, simple misunderstanding of the inquiry dogs all survey questions. If the question addresses “effective actions” and someone reads it as inquiring about “ineffective actions”, all is lost.
  3. Misread selections. This pitfall mirrors misreading questions, but applies to the multiple selections. Negative constructions especially bedevil people, as in “Doesn’t apply without exception.”
  4. Misread instructions. This mistake commonly appears when questions ask for a number\index{number answer}. Careful survey designers can plead with respondents to tell them “Only numerals, not percent signs or “percent”. The guidance can clearly state “do not write ranges such as “3-5” or “4 to 6”, do not add “approx..” or ” ~ .” For naught. Or people sprinkle in dollar signs or write “2 thousand” or “3K”. Humans have no trouble understanding such entries, but computers give up. If an entry is not in the right format for a number, a computer will treat the entry as a text string. Computers can’t calculate with text strings. Fortunately, computers can be instructed to scrub the answers so that they are in a standard format. And sometimes the survey software can check the format of what’s entered and flash a warning message.
  5. Fill in nonsense when answers are required. Some participants can’t be bothered to waste their time on irrelevant questions, so they slap in the first selection (or some random selection). Unless the analyst takes time to think about the likelihood of a given answer in light of other answers or facts, this mistake eludes detection.
  6. Give contradictory answers. Sometimes a survey has two questions that address a similar topic. For example, the survey might ask respondents to check the cost management techniques they have tried while a later question asks them to rate those techniques on effectiveness. What if they rate a technique they didn’t say they had tried, or they fail to rate a technique that they had tried? This could be a form of contradiction.
  7. Become lazy. When there are too many questions or selections for questions go on and on or reasonable answers require digging, respondents can throw in the towel and make sloppy selections. Here the fault lies more with the survey designer than with the survey taker.
  8. Too quickly resort to “Other”. A form of laziness, if the selections are many or complex, some people just click on “Other” rather than take the time to interpret the morass. If they write a bit about what “Other” means, that text will reduce the adverse effects of the lack of discipline.
  9. Mis-click on drop-downs. If you find a “United Emirates” in your corporate headquarters data and nearly everyone else is “United States”, you can suspect that one person made a mistake on the drop-down list.
  10. Pick too many or too few. If they pick too many selections, the software might give a warning. Otherwise, if “select no more than three” governs, the software might simply take the first three even if four or more were checked. The survey software should be able to give a warning if this mistake happens.

Thoughts on restricted-choice questions

Survey questions where the instructions say “Check only one” or “Check the most important [or challenging or innovative or risky or whatever]” inevitably leave some respondents with difficult decisions. Other than demographic inquiries, where usually only one selection makes sense, it is often hard to figure out single selections that satisfy someone as the best answer.  Life is more complicated than single answers.

To accommodate, survey designers often permit respondents answering multiple-choice questions to “Select all that apply.” Being able to click on two or more selections from the list allows most respondents to capture more nuanced and comprehensive aspects of their circumstances than can any single “best” answer. That flexibility, however, comes at a cost for the survey analyst, who must resort to tallying how many respondents chose each of the selections. Also, it is impossible to know the relative importance of the items selected by any respondent or what it means if one person selected only two while another selected eight, for example.

Sometimes a third style appears; the question instructs respondents to “Select the top three” [or two, four, or even five]. Such a restricted-selection question tries for a compromise between rigidity (one choice only from the list) and laxity (pick however many you want from the list). A modicum of discipline is placed on respondents as they should evaluate the relative importance of all the selections, which means that the data accumulated from their answers more insightfully highlights some finding.

K&L Gates sponsored a survey of general counsel which illustrates some aspects of this design decision. In one part the survey explored the views of respondents on legal risks arising from various technologies. Based on the header of the plot snippet below, the survey asked: “Which of the following technologies present the highest legal risks?” The plot presents the aggregated responses.

From the ambiguous parenthetical above the plot [“(Top 3 extremely risky)”] we can’t be sure of the actual style of the question. It could have asked respondents to check whether they thought each of a list of technologies was “Not risky”, “Somewhat risky,” “Quite risky”, or “Extremely risky”. This would have been a matrix question because it probably looked like a table with one row for each technology and then the legal riskiness choices as buttons on the right to click.  The firm then presented in the circles the three technologies that garnered the most “Extremely risky” checks.

Alternatively, the question might have been a restricted-selection question that told respondents to check up to three technologies from the list if they felt the technology was “extremely risky”.  Or the question might have required them to pick exactly three that were “extremely risky.” That would have been a poorly-designed survey because it presumes that respondents believe three of the technologies pose very high levels of legal risk to their companies.

In short, reverse engineering the question that led to the graphic above yields no conclusion about its style. But it does introduce some notions that underlie survey questions where respondents can pick more than one selection.

How common are multiple-choice questions in law-firm research surveys?

Are multiple-choice questions the most common format in law-firm research surveys? Absolutely yes would be the answer based on impressionistic leaf-throughs of some surveys. But this being a data analytics blog, we need genuine, hard-core empirical data.

Three recent surveys, picked somewhat randomly and each from a different law firm, provide data and start to suggest an answer. 1

In its 32-page report on debt financing, DLA shows 17 plots. Of them, 15 are multiple-choice with an average of 4.8 selections per question. At least five of the selections are ordinal, by the way, meaning that they have a natural order of progression. Here is an example of ordinal selections, from “Strongly agree” to “Strongly disagree”.

The Hogan Lovells report regarding Brexit has five plots throughout its 16 pages and three of them appear to be based on multiple-choice questions. The average number of selections is 3.3. Finally the K&L Gates survey of general counsel has 15 plots in its 20 pages. A dozen of them summarize multiple-choice questions with an average of more than six selections per question. 2

Combining the findings from this small sample, it turns out that 80% of the plots draw their data from multiple choice questions. The other plot types include some matrices (tables), a map, and some circle visualizations (data in circles). As to the number of selections, between four and five per question seems to be the average.

Notes:

  1. DLA Piper, “European Acquisition Finance Debt Report 2015” (2015); Hogan Lovells, “Brexometer” (2017); and K&L Gates, and “General Counsel in the Age of Disruption” (2017)
  2. We cannot determine precisely how many selections are in some of the questions because the report only shows the top three or the top five selections that were picked.