At least a dozen kinds of software used in survey project

As to the software that a law firm might use to carry out a survey project, the list is lengthy. Not that the firm itself needs to have each of the applications that are listed below, but from start to finish someone may need to deploy them.

  1. Database software for customer relations management (CRM) or some software to provide email addresses of invitees
  2. Bulk email software (e.g., ConstantContact) so that the firm can effectively invite its clients and others to take part in the survey
  3. Word processing software to draft questions, invitations and the text of the report
  4. Survey software (e.g., NoviSurvey, SurveyMonkey, SmartSurvey, Qualtrics, SurveyGizmo) to create an online questions and capture the responses of the participants
  5. Spreadsheet software (e.g., Excel) so that the firm can export the responses from the survey software into a worksheet and manipulate the data
  6. Statistical software (e.g., R, Stata, Excel, Tableau) so that an analyst can calculate various statistics
  7. Data visualization software (e.g., R, Excel or PowerPoint) so that an analyst can prepare plots and graphs
  8. Desktop publishing (e.g., LaTex, markdown languages, Adobe InDesign) so that the firm can integrate text, plots, tables and other elements into the report
  9. Presentation software (e.g., PowerPoint) or specialized software to prepare infographs
  10. Graphical design software (e.g., gimp, PhotoShop) so that the firm can work with images and photos and otherwise design the report as it wishes
  11. PDF software (e.g., Foxit, Adobe Acrobat, PScript5, ScanSoft, QuarkXPress) so the firm can save its report in a portable document format [see the plot below for more details]
  12. All kinds of other software are also involved, such as email, instant messaging, social media, website packages, video-conferences, calendaring, backup software and more.

The plot below examined data from 153 survey reports in PDF format. Of the set, 141 include metadata about the software used to create the report. The firms used nine different software packages although over the years they used multiple versions of the same package. Thus, for example, Adobe InDesign — all versions — dominated with more than 100 reports created with it.

Reproducible research as a desiderata for legal data analysis

What is termed ‘reproducible research’ urges all data scientists to keep careful track of their data’s source and transformations.  Each step of the way from the original numbers – the headwaters – through each addition, subtraction, calculation or revision should be recorded so that another person could reproduce the final data set – the mouth of the river.  They should be able to evaluate the appropriateness of the complete stream of alterations and manipulations.

As to the provenance of data, the URL and date on which information was scraped from a web site would be crucial.  The publication, date and page of data obtained from print would be key.  How the original data was collected, such as by an export from an email system or survey needs to be spelled out.  And so on.

In the first instance, programmers approach reproducibility with the code itself, which tells another programmer in the same language what is going on, such as turning a character variable into a numeric variable or multiplying a group of numbers by something or choosing a subset of the data.   But often code alone can be cryptic, or the logic is not clear, or the reasons for certain choices that were made are murky and difficult to recreate.

Liberal commenting by the programmer can fill the gap to create a roadmap for others.  All programming languages have a simple method to say to the computer, “Ignore this line, it is a note to myself and others.”  Good programmers explain in comments what the following lines of code do, why the script is doing that, and any issues or decisions in play.   It is an excellent practice to write fulsome comments that would allow a non-programmer to follow the origin, transformations, and outputs of a data workflow.  Such comments, by the way, greatly help the programmer later when she returns to the now-forgotten analysis and has to reconstruct it.

Beyond spelling out the source of the data, the programming calls themselves, and ample comments, what is known as ‘literate programming’ gives guidelines for how the code should be divided up, indented, and how the supplementary annotations are added.

In the legal industry, data analysts should strive for reproducible research, transparency in every step of their work.

Modest involvement with “AI software” according to ILTA survey

Signs are everywhere that the U.S. legal industry has started to recognize the potential for computer-assisted decision-making.  For example, the 2016 ILTA/InsideLegal Technology Purchasing Survey had a question on the topic: “Is your firm currently evaluating (or already utilizing) artificial intelligence technologies, systems or related strategies?”  The web-based survey was distributed to 1,231 ILTA member law firms of whom 14% responded (172 firms).

Only 13% of the respondents answered the AI question favorably, consisting of 2% already utilizing such technologies and 11% “currently evaluating” it. Write-ins cited by them include IBM Watson, Kira Systems, RAVN, Lex Machina and ROSS.  Not surprisingly, “half of the respondents that are currently evaluating AI come from Large Firms”, defined as firms with more than 200 lawyers [They comprised 19% of the total respondents.].

What makes it impossible to assess the actual level of support for AI-software is that “Response percentages are based on total responses per question, not overall survey participation” [emphasis added].  Therefore, we cannot say that 13% of 172 firms responded favorably because the survey report does not state how many firms provided an answer to that particular question.