Center For The Study of Financial Regulation

Winter 2011 - Issue NO.4

USING TEXTUAL ANALYSIS TO UNCOVER FRAUD

by Tim Loughran, the C.R. Smith Professor of Finance at the Mendoza College of Business of the University of Notre Dame.

Page 1

For investors to make proper investment decisions, access to timely and detailed information on publicly traded firms is of critical importance. Yet, if firm managers create documents for shareholders with an excessive number of words, will investors have the ability to sort through the information to potentially uncover questionable/aggressive accounting practices? Or, will a hopelessly long number of pages in annual reports lead investors to not even bother with the document? Without investors or analysts pouring through company documents, will managers feel more entrenched in their position of power?

Sadly, the last few years show a disturbing trend in the number of words contained in 10-Ks. In 1994, one of the first years of electronic filing on the SEC's EDGAR (Electronic Data Gathering, Analysis, and Retrieval) system, the median number of words in a 10-K was 27,062. By 2001, the typical number of words rose slightly to 27,729. Due in part to increased government regulation and oversight through the Sarbanes-Oxley Act of 2002, the median number of words rose to 34,693 in 2003, 36,004 in 2005, and 46,389 in 2009. Thus, from 2001 to 2009, the median U.S. firm added about 20,000 additional words to their 10-K.

Although the number of words contained in 10-Ks recently has risen sharply, the SEC website allows for detailed text searches by investors.1 In a forthcoming academic paper, Notre Dame Finance Professor Bill McDonald and I used textual analysis to parse more than 50,000 10-Ks of publicly traded firms filed since 1994 for phrases that may imply the firm is using aggressive accounting practices that could lead to subsequent fraud charges.2 Our paper was motivated by a list of phrases contained in Vito Racanelli's August 31, 2009 Barron's article ("Watch Their Language").

The basis for our examining more than 12.5 billion words in thousands of documents is to see if certain phrases can help predict the probability that a firm will subsequently be accused of fraud. Racanelli's (2009) article listed 13 suspicious phrases of questionable accounting or professional behavior that investors should be watchful for. Is there a statistical linkage between word selection by a company's management and being sued for fraud by shareholders?

First, we obtained a list of 585 firms listed on the NYSE, Amex, or Nasdaq that had a 10b-5 lawsuit filed against company management. In the lawsuits, some shareholders have accused managers of material omissions that lead to inflated stock levels for the firm. Next, we compared the 13 suspicious phrases from the Barron's article and the chances of being sued by shareholders in a 10b-5 lawsuit in the year following the 10-K filing date.

Back to Top

Page 2

Examples of suspicious phrases are "unbilled receivables," "consulting relationship," and "bill and hold." None of these three mentioned phrases appears in more than 3 percent of the 10-Ks filed on EDGAR. For example, "bill and hold" appears in only 0.38 percent of 51,115 10-Ks during the 1994-2008 time period. "Unbilled receivables" generally means that the firm already has provided the goods or services of the sale, yet has not billed the customer.

The phrase "consulting relationship" may indicate poor corporate governance by management since the wording often refers to directors who work for the firm in side relationships.

Companies with directors that perform consulting services for the same firm are generally viewed by shareholders as not being completely independent directors. The term "bill and hold" often is considered suspicious in nature by investors since it allows the seller to book the sale before actually shipping the product to the buyer. This potentially allows the seller to inflate or overstate current quarter revenue.

In our logit regressions, the dependent variable is equal to one if the company is sued in the year following the 10-K filing date or if the firm's file date is during the time period that the alleged material omissions occurred. Bill and I also include a number of firm-level control variables that could be linked to subsequent fraud charges (such as the firm's market value, book-to-market ratio, prior year stock turnover, prior stock market performance, and level of institutional ownership). We find that all else being equal, firms with larger market values, lower book-to-market ratios (i.e., growth companies), and poor prior year stock market performance are more likely to be involved in a subsequent 10b-5 lawsuit.

From the regression results, we find that the more often suspicious phrases such as "unbilled receivables," "consulting relationship," and "bill and hold" appear in a 10-K, the more likely is it that the firm will be sued the following year. This relation is statistically significant at conventional levels, even after controlling for a number of firm-level variables.

Back to Top

Page 3

Thus, word selection by firm managers has real meaning. Key phrases can potentially suggest aggressive or questionable accounting practices that warrant closer attention by investors. What is the implication for regulators in linking the 13 suspicious phrases with firms subsequently accused of fraud?

Our analysis highlights the fact that even though U.S. 10-Ks are getting substantially longer in length, the use of computer programs to parse billions of words is a readily available technology. Regulators and analysts have the ability to scan documents quickly for suspicious phrases that potentially warrant more attention. Obviously, just because company XYZ uses the phrase "unbilled receivables" in its 10-K does not prove that the firm is currently engaging in fraud. However, managers might be tipping their hat at the use of aggressive/questionable accounting practices by their choice of words. The selection of unusual or "red-flag" words in a 10-K may assist regulators in identifying potential issues before it is too late for investors. More generally, textual analysis allows regulators and investors to carefully monitor the semantic content and tone of important, yet overwhelmingly long and dense, SEC filings.

Reference

1. For text searches, please go to the SEC's website at http://searchwww.sec.gov/EDGARFSClient/jsp/EDGAR_MainAccess.jsp. Turning off the stemming option improves targeted phrase searches on the "Advanced Search Page."

2. See Loughran, Tim and Bill McDonald, 2010, "Barron's Red Flags: Do They Actually Work?," forthcoming in the Journal of Behavioral Finance.

Back to Top