If only there were a standard way to describe survey participants by industry … There is! Law firms could identify, analyze, and report on their participants by the North American Industry Classification System (NAICS) categories. This system has moved beyond the venerable SIC (Standard Industrial Code) categories. The NAICS offers a range of two-digit classifications that map well to the extant proliferation of industry/sector designations seen in law firm reports. Those classification together with the three- and four-digit elaborations on them easily suffice for law-firm research surveys.
If NAICS codes became the convention for law firm research surveys, at least four benefits would follow.
Mash-up data. For data analysts, “mash-up” describes the process of melding two sets of data. If firms used the NAICS, other data would then be available for analysis. Longitudinal data sets, meaning those maintained over a period of time, that the U.S. government has collected by NAICS code can supplement information about the number of businesses in the industry, more detail about those businesses, the number of employees in the businesses, and so forth. Everyone would benefit from richer, more insightful analyses after various mash-ups.
Consistency among surveys. If law firms adopted this standard classification system, readers of their reports and researchers would be much more able to compare results by industries. In the current disorder, and so long as each firm defines its industries idiosyncratically, comparisons and meta-analyses become much harder to carry out, if not impossible.
Improving the representativeness of the sample data. Because the NAICS data sets provide law firms with reliable counts of companies by industry, they could deploy techniques to make their convenience samples more representative of the actual distribution of U.S. businesses. One method of doing this, which we explain elsewhere, is called “raking.” As sample data is transformed to closely resemble population data, deeper statistical analyses become available.
Impute missing values. “Imputation” is the term statisticians use for filling in missing values. If a law firm has data about its participants by their NAICS code plus other information such as revenue, the firm could impute the number of employees of that company. An explanation of that methodology to supplement data can be found elsewhere, but it would be available to a firm so long as the industry coding conforms to the NAICS. For example, a firm that collects revenue, industry code, and state can even more accurately impute a number for employees. Fuller data sets enable better analyses.