NAICS classification of industries would help surveys four ways

If only there were a standard way to describe survey participants by industry … There is! Law firms could identify, analyze, and report on their participants by the North American Industry Classification System (NAICS) categories. This system has moved beyond the venerable SIC (Standard Industrial Code) categories. The NAICS offers a range of two-digit classifications that map well to the extant proliferation of industry/sector designations seen in law firm reports. Those classification together with the three- and four-digit elaborations on them easily suffice for law-firm research surveys.

If NAICS codes became the convention for law firm research surveys, at least four benefits would follow.

Mash-up data. For data analysts, “mash-up” describes the process of melding two sets of data. If firms used the NAICS, other data would then be available for analysis. Longitudinal data sets, meaning those maintained over a period of time, that the U.S. government has collected by NAICS code can supplement information about the number of businesses in the industry, more detail about those businesses, the number of employees in the businesses, and so forth. Everyone would benefit from richer, more insightful analyses after various mash-ups.

Consistency among surveys. If law firms adopted this standard classification system, readers of their reports and researchers would be much more able to compare results by industries. In the current disorder, and so long as each firm defines its industries idiosyncratically, comparisons and meta-analyses become much harder to carry out, if not impossible.

Improving the representativeness of the sample data. Because the NAICS data sets provide law firms with reliable counts of companies by industry, they could deploy techniques to make their convenience samples more representative of the actual distribution of U.S. businesses. One method of doing this, which we explain elsewhere, is called “raking.” As sample data is transformed to closely resemble population data, deeper statistical analyses become available.

Impute missing values. “Imputation” is the term statisticians use for filling in missing values. If a law firm has data about its participants by their NAICS code plus other information such as revenue, the firm could impute the number of employees of that company. An explanation of that methodology to supplement data can be found elsewhere, but it would be available to a firm so long as the industry coding conforms to the NAICS. For example, a firm that collects revenue, industry code, and state can even more accurately impute a number for employees. Fuller data sets enable better analyses.

Four reasons why demographic questions usually lead off a survey

By convention, the first few pieces of information asked of respondents on a questionnaire typically concern demographic facts (title, industry, location, revenue). The reasons for this typical order might be termed psychological, motivational, practical, and instrumental.

Psychologically, law firms want to know about the person who is providing them data. Is this person higher or lower in the corporate hierarchy? Does this person work in an industry that matters to the firm or matters to the survey results? They want to know that the person is credible, knowledgeable, and falls into categories that are appropriate for the survey. To satisfy that felt need, designers of questionnaires put demographic questions first.

When a questionnaire starts with questions that are easy to answer, such as regarding the respondent’s position, the industry of their company, and its headquarters location, it motivates the respondent to breeze through them and charge on. They sense that the survey is going to be doable and quick. Putting the demographic questions first, therefore, can boost both participation rates and attrition rates.

A practical reason to place the demographic questions at the start is that doing so allows the survey software to filter out or redirect certain respondents. If an early question concerns the level of the respondent, and if their choice falls below the firm’s desired level of authority, the survey can either thank the respondent and close at that point or move their subsequent questions to a different path. Vendors who conduct surveys often cull out inappropriate participants, but law firms rarely take this step. Rather, they usually want as much data as they can get from as many people as will take part.

Fourth, if the demographic questions are at the start of the questionnaire, then even if the participant fails to complete the survey or submit it, it may be possible that the survey software captures valuable information. This could be thought of as a instrumental reason for kicking off a questionnaire with demographic questions. These days, the law firm particularly wants to know the email address of the participant and their title. That information probably flows into a customer relationship management (CRM) database.

Challenges choosing categories, e.g., for revenue demographics

When a law firm invites its contacts to to take a survey, those people who accept probably form an irregular group in terms of the distribution of their corporate revenue.  Their revenue can range by the happenstance of self-selection and how the list was compiled from negligible revenue to many billions of dollars. When the firm’s report describes the revenue characteristics of the group, the firm must decide what ranges of revenue to use.

The firm might slice the revenue categories to put in them roughly equal numbers of participants. Doing this usually means that the largest category spans a wide range of revenue — “$3 billion and up” — whereas the smallest category tends to be narrow — “$0 to 100 million.” Such an imbalance of ranges results from the real-world distribution of companies by revenue: lots and lots of smaller companies and a scattering of huge ones (the distribution is long-tailed to the right). Stated differently, the corporate revenue pyramid displays a very, very broad base.

Alternatively, a law firm might choose to set the revenue ranges by some specific range values, perhaps “\$0-1 billion, \$1-2 billion, \$2-3 billion” and so on. The categories may make sense a priori, but binning revenue this way can result in very uneven numbers of participants in one or more of the categories depending on what categories are chosen, how narrow they are, and the vagaries of who responds.

Davies Ward Barometer (2010) [pg. 10] explained the corporate revenue ranges of its respondents in words. These are unusual ranges. The distribution skews toward remarkably small companies. Note from the last bullet that almost one out of three survey respondents “are not sure of their organization’s annual revenue.” Perhaps they do not want to disclose that revenue, as they work for a privately-held company. Or perhaps the organization has no “revenue,” but has a budget allocation as a government agency.

With a third approach, a firm fits its revenue categories to its available data set so that plots look attractive. You can guess when a firm selects its revenue categories to fit its data set. Consider the plot below from DLA Piper’s compliance survey (2017) [pg. 26]. The largest companies in the first category reported less than $10 million in revenue; the next category included firms with up to 10 times more revenue, but about the same percentage of respondents; the third revenue category again spanned companies with up to ten times more revenue, topping out at $1 billion, but close to the preceding percentages. Then we see a small category with a narrow range of $400 million followed by the two on the right with half the percentages of the left three. It appears that someone tried various revenue categories to find a combination that looks sort of good in a graphic.

The fullest way to describe the revenue attributes for participants turns to a scatter plot. From such a plot, which shows every data point, readers can draw their own conclusions about the distribution of revenue.

Priority of demographic attributes (the four most common)

Having studied more than 70 survey reports by law firms, I sensed that the demographic attributes recognized by the firms exhibit a fairly consistent priority. First, and thus most importantly, firms focus on respondent position, then respondent company’s industry, business model and location. That priority order for the four demographic characteristics makes sense.

The rank of the person completing the survey suggests their depth of knowledge of the topic. You want general counsel giving their views more than junior lawyers who have just joined the company. You seek C-suite executives, not assistant directors. The level or position also signals the ability of the firm to reach decision makers and persuade them that their time is well spent taking the survey. Implicitly, a high proportion of busy leaders says “This topic has significance.”

Industry (sector) comes next on the priority list because legal issues impinge on each industry differently. Also, readers of a survey report not only want to know that it speaks for companies in their industry but also they also would like to see how the results differ industry by industry.

“Business size” is my term for the third-most-common demographic. The typical measure relies on the annual revenue of the company. Most surveys proudly state that they have a good number of large companies as those companies are more prestigious (and are probably the targets of business development efforts by the firm). A less common business size is number of employees. For non-profits and government agencies revenue has less relevance (budget may be the better metric), but all organizations have employees. Still measure often gives less insight for profit-seeking organizations as it can vary enormously across industries and indeed within industries.

The fourth-most-common demographic regards the geography of respondent organizations, either its country, region or continent. Quite a few surveys, however, collect only participants from a single country and therefore ignore this demographic attribute. [We did spot one survey that broke out respondent data by states in Australia.]

We chose three surveys to spot test the relative importance they attach to their demographics.

  • The Dykema Gossett survey of merger and acquisition specialists (2017) gathered data on the position of its respondents, the sector in which their company operates, and their company’s revenue. The firm’s report did not attach numbers of participants or percentages to any of the demographic attributes but it described them in that order.
  • The Carlton Fields survey of class actions (2014) likewise summarized its participants by position, sector and revenue, in that order, but disclosed nothing further.
  • Of the spot-tested reports, by far the best handling of demographics comes from the Baker McKenzie cloud survey (2017). That report precisely states breakdowns by geography, position, industry sector, and number of employees. Even better, the report includes plots that visualize these attribute details. Baker McKenzie described the position of of individual respondents with seven choices of functions (IT, Sales, Legal, etc.) but the firm did not provide revenue data. In other respects, however, it commendably shared the demographics of its survey population. The order of presentation was geography, position, and business model.  Interestingly, for geography the report uses a map to convey where their “top respondents” came from.

If we had full data on the treatment of demographic attributes by all the surveys available to us, our inductive sense of these priorities would be confirmed or overturned. Perhaps in another post. Meanwhile, note two points. First, which demographics are important depends on the purpose of the research. Second, the report ought to take advantage of the demographic data; to create analytic value, somewhere the report should break out the findings by demographic segments.

Irregular disclosure of demographics from one survey to the next in a series

How consistently do law firms track and disclose demographic data? Not very consistently, we found, based on three pairs of surveys conducted by different firms: Foley Lardner on Telemedicine in 2014 and the follow-up in 2017, Seyfarth Shaw on real estate in 2016 and the follow-up the year after, and Proskauer Rose on employment in 2016 and 2017. Before studying those survey pairs, I had thought that firms would stick pretty closely to the way they treated demographics in their first survey, perhaps modifying and improving them a little bit for the follow-on survey. Not true, not at all!

The plot below attempts to summarize how the second survey of each pair compares to the first survey with respect to demographics. Each bar has a segment for the five demographic attributes in the legend. Each segment can be a zero if the report does not include it, or a 1 if the report’s disclosure is minimal, on up to a five for a very good disclosure.

If a segment in the second column is higher than its counterpart in the first column, then the second survey improved on the first one. Perhaps it went from a 3 to a 4. Typically, that would mean the second report had breakdown with more categories or more information on percentages. For example in 2014 Foley Lardner (“Foley First” on the bottom axis) reported on five levels with percentages for each of respondent size (employees, or revenue), organization type, and position of the respondent. In the second report three years later, even with 50 more participants than in the first year, the firm combined two of those levels (and gave the percentage), but gave no other information. Thus, its first column’s segment for level (the second from the top, in light blue) starts as a five but drops in the firm’s second column (“Foley Second”) to a one. The first report did well on number of employees or revenue (red at the bottom) but that demographic information disappeared in the second survey report.


Taking another example, in its inaugural survey Proskauer Rose did not provide details about the locations of its respondents (as indicated by the absence of the light-yellow segment), but provided some information about that attribute in its subsequent survey report (the second segment from the top of the column labelled “Proskauer Second”).

Oddly, Foley & Lardner broke out three kinds of hospitals in its first year but combined them all in its second year. Two other categories of organization type matched, but two new ones appeared the second year.

The number at the bottom of each column tells how many participants that survey had. Hence, it is also odd that the three firms saw significant increases in the number of respondents year-over-year. However, they did choose to elaborate on their demographic reporting.

In other words, given the irregular disclosure of data about respondents on these five important attributes, it is difficult to know how well the two sets of respondents resemble each other.

Demographic data tailored for survey

We have written extensively about demographic data and how law firms report it from their research. Some kinds of demographic data figure prominently and consistently in reports, such as what we have termed the Big 4 demographics. But some surveys explore topics that justify other, one-off demographics. We show six of them below as displayed in reports issued by six different law firms.

HoganLovells, researching foreign direct investment [pg. 74], shows in the plot on the left 15 roles the firm asked about. On the right, White & Case’s research into arbitration collected demographic data about respondents’ legal background [pg. 52].

Moving from the two plots above to the two below, Davies Ward, interested in Canadian lawyers, sought answers on the years respondents had been practicing law [pg. 9]. Norton Rose Transport [pg. 2] turned to a donut plot to display 11 roles within four industries.




In the final two instances, in the left plot Winston Strawn looked at risk [pg. 31]. Its questionnaire asked not just about where the respondent’s company was based but also where the respondent individually was based .

And Foley Lardner studied telemedicine [pg. 10] in the first of a series, starting in 2014, and drilled down on types of healthcare organizations.














Law firm research surveys might ask for all kinds of background, profile data that illuminates their findings. It is easy to think of examples, such as patent records outstanding for research into intellectual property practices or the age of respondents for research into demographics.

Demographic attributes, categories and number of participants

It seems likely that surveys with more participants would cover more demographic attributes and divide those attributes into more categories. Larger numbers of respondents would encourage more slicing and dicing. Or so I thought.

To test that hypothesis, I looked at 10 law-firm research surveys. My less-than-scientific method to pick them started with the last one alphabetically on my list by firm name and pored over the surveys in reverse order until I found 10 with usable data. The by-passed surveys either did not disclose their number of participants, gave very sketchy demographic information, or both (a few were in a series by the same firm). Having trapped the eligible surveys, I counted how many categories the report included for each of the four most common demographics — position of the respondent, revenue of the respondents’ companies, location of the companies, and industry (what I have called the “Big 4”).

The first of the two charts shows how the total number of categories in those four demographic attributes compares to the number of participants in the survey.Each red circle stands for one survey’s number of participants (on the bottom axis) and total Big 4 categories (on the left axis). The wavy blue line shows a non-linear trend line. Very non-linear, and not much of a pattern!

The second chart displays the same total of categories for the four most common demographics, plus the total number of categories in any other demographic attributes on the left axis. It reflects the same 10 surveys, so the bottom axis remains the same, but a greater range on the vertical axis because it includes counts from any other demographic attributes. The trend line here shows even less of a pattern than the squiggle of first plot!

Sigh. At least with this set of surveys, we can’t support a hypothesis that more participants means more demographic attributes. Perhaps if we broadened this particular inquiry to cover more surveys we might eventually distinguish a clearer relationship, but for the moment, none is apparent.

Supplemental demographic data

On law-firm research questionnaires, four demographic questions account for most of that type of question: the rank of the individual filling out the questionnaire, the revenue of that person’s organization, its industry, and (with less frequency) the country of the organization. Those are the Big Four profile facts.

Nevertheless, not infrequently law firms ask from their respondents for supplemental facts that go beyond the Big Four. We collected instances of eight of those supplemental demographics. In no particular order the diagram briefly states them. The text that follows notes the extent to which the firm played back the findings in any analysis.

Type of organization: Davies Ward in 2010 breaks down its respondents by “publicly traded company,” “private company”, “government department or agency,” “not-for-profit sector or another type of organization,” and “wholly owned subsidiary of a public company” [pg. 8]. The report makes extensive and commendable use of this demographic data; it shows for each type of organization comparable, in-depth analyses. This usage represents the best way to collect and report on demographic data.

Listed on a stock exchange: Hogan Lovells in 2014 asked, “Is your company listed on at least one stock exchange?” [pg. 74]. All the report did with the yes and no checks was to state “81% of [respondent] companies are stock market listed” [pg. 6].

Market capitalization: Littler Mendelson in 2017 collected this from their respondents, using three categories: Large cap, greater than \$4 billion, Mid cap,\$1 billion to \$4 billion, and Small cap, less than \$1 billion [pg. 24]. Nothing is heard about that data in the firm’s report.

Commercial geography: Eversheds in 2008 reported that respondents were from “domestic, international (at least two countries) and global (at least two continents) law firms” [pg. 2]. The report makes modest use of that demographic data. For example, “The least satisfied lawyers were at US and Canadian law firms, or working for domestic rather than international law firms.” [pg. 4]. From a different view, Fulbright Jaworski in 2009 asked respondents to select one of seven categories for the number of countries where their company has facilities [pg. 6]. No use is made of the data except a table and bullet points below that emphasize three conclusions from it.

Longevity: Norton Rose in 2014 reports the ages of the respondent companies in five ranges [pg. 5] and their number of employees in six ranges [pg. 5]. The research concerned employee stock option plans, so both demographic facts relate directly to the study.

Employees: Baker McKenzie in 2016 collected data on the number of employees of its respondents, offering six categories to select from [pg. 7]. But the report does not analyze any findings by this measure.

From this small sample, one can see that law firms consider and collect data about their respondents that go beyond the customary Big Four. Disappointing, therefore, that they make such little use in their analyses of that supplemental information.

Disclosure of revenue profiles by reports published in 2017

As with a previous post, to study revenue profiles I zeroed in on a set of 14 survey-research reports published by law firms in 2017. 1 That post considers how fully and consistently the firms shared profile information on their participants’ industries. Here the focus turns to how that group of reports disclosed aggregated categories of their participants’ annual revenue.

To start with discouraging news, eight reports tell readers absolutely nothing about the revenue of their respondent population (one chose, oddly, to give three market cap categories). This tell-nothing decision by eight law firms is regrettable because readers of their reports are severely handicapped in judging how credible the report is and, more specifically, how well its findings apply to the reader’s organization. The omission of this profile (demographic) data also suggests that the firms did not analyze their findings by revenue categories.

Turning to the remaining six reports, four give only a single revenue indicator, such as “almost half the respondents reported revenue of greater than $1 billion.” In the plot below, those reports show only two bars: one for the amount of revenue stated and up and one slightly less than the amount stated and down. Thus, based on the single indicator, I created a binary categorization of revenue.The reports of Norton Fulbright and Hogan Lovells exemplify splendidly what survey reports should do. They broke their respondents into three or six revenue categories, respectively. As royalty, they deserve to have purple bars, compared to the yellow bars of the other firms. In the interests of full disclosure, I should note that I slightly modified some of the range data as given so that the the plot has more uniformity.

It is clear that the revenue categories applied by the firms that used categories cover an extremely wide range, from less than $99 million to more than $20 billion. Moreover, each firm conjured up its own category boundaries, with almost no standardization across the reports (except what I imposed). Three reports used “more than $ billion”, but that was the only shared category. As with demographic reporting on industries, everyone in the legal profession would gain if there were more disclosure of revenue demographics and more consistent use of similar bands.


  1. The post explains how the reports were chosen and which firms were represented in the data set.

Seven reasons why surveys collect demographic data

When law firms sponsor or conduct research surveys, they collect demographic information from respondents for many purposes. Here are seven reasons listed roughly from the most important at the top to the least important at the bottom.

  1. Demographic data presented in a report gives the report’s findings credibility. When readers nod their head and are impressed by the quality of the respondents, they are more likely to read on, respect the effort and thoughtfulness of the law firm, and accept conclusions of the report.
  2. Crucially, demographic data shapes analysis and generates insights. Law firms break down results by different demographic characteristics, such as by industry, and so they need to have collected that data carefully through the questionnaire. In the terminology of programmers, demographic information is called categorical (or factor) variables and enables all kinds of aggregation and analysis.
  3. Another reason why demographic questions are ubiquitous is that the data conveys how well the data set represents the entire population. Is there a reasonable distribution of companies by size, industry and country? If the firm collected 40 utilities and only 10 manufacturing companies, it suggests that the industry mix is unbalanced as compared to the distribution of companies by industry in the United States.
  4. Readers like demographic data so that they can judge whether the findings have relevance for their particular situation. General counsel want to know what other general counsel think; UK companies like to learn about UK data.
  5. Demographic questions are a snap for respondents to answer.  Typically those questions are asked at the start and being easy they create a positive attitude; respondents become to some degree invested in the survey. This makes them less likely to drop out later. Meanwhile, if the survey software saves partial answers, then even when an invitee drops out or fails to complete questions, the firm has collected names and email addresses of invitees who were at least initially interested.
  6. With demographic data, the law firm can consider weighting some of the answers so that the overall results are more representative of the population. Or, the firm might decide to push on and entice more respondents because there are some shortfalls. Perhaps there are too few European respondents and more efforts need to be made to attract others.
  7. Finally, demographic questions appear in every research survey because … they appear everywhere! This means that templates are readily at hand, including tested phrasing and methods to present the data graphically.