Thursday, October 26, 2023

Bloated Disclosures: Can ChatGPT Help Investors Process Financial Information to Reduce Noise

[2306.10224] Bloated Disclosures: Can ChatGPT Help Investors Process Financial Information?

Generative AI tools such as ChatGPT can fundamentally change the way investors process information. We probe the economic usefulness of these tools in summarizing complex corporate disclosures using the stock market as a laboratory. The unconstrained summaries are remarkably shorter compared to the originals, whereas their information content is amplified.

When a document has a positive (negative) sentiment, its summary becomes more positive (negative). Importantly, the summaries are more effective at explaining stock market reactions to the disclosed information. Motivated by these findings, we propose a measure of information ``bloat."

We show that bloated disclosure is associated with adverse capital market consequences, such as lower price efficiency and higher information asymmetry. Finally, we show that the model is effective at constructing targeted summaries that identify firms' (non-)financial performance. Collectively, our results indicate that generative AI adds considerable value for investors with information processing constraints.
Subjects: General Economics (econ.GN); Artificial Intelligence (cs.AI); General Finance (q-fin.GN)
Cite as: arXiv:2306.10224 [econ.GN]
  (or arXiv:2306.10224v2 [econ.GN] for this version)
  https://doi.org/10.48550/arXiv.2306.10224

Submission history

From: Alex Kim [view email]
[v1] Sat, 17 Jun 2023 01:22:08 UTC (1,951 KB)
[v2] Fri, 20 Oct 2023 21:40:00 UTC (1,414 KB)

 see also working paper

Summary

Based on the paper summary, here are the key points:

  • The study explores the usefulness of generative AI like ChatGPT in summarizing complex corporate disclosures like MD&As and earnings conference calls.
  • The AI-generated summaries are much shorter than the original documents, about 25-30% of the length on average. This suggests large potential gains in information processing efficiency.
  • The sentiment of the summaries appears more pronounced compared to the originals. Positive (negative) documents become more positive (negative) after summarization. This indicates the AI model amplifies the true underlying sentiment.
  • The sentiment of the summaries has greater explanatory power for contemporaneous stock returns compared to the raw document's sentiment. This shows the summaries better capture the key information.
  • Motivated by this, the authors propose a measure of "disclosure bloat" defined as the relative reduction in document length after summarization. It aims to capture redundant or irrelevant details in disclosures.
  • The bloat measure varies meaningfully over time and across firms. It is higher when performance is poor, indicating possible obfuscation incentives.
  • Firms with bloated disclosures exhibit lower price efficiency and higher information asymmetry, pointing to adverse capital market consequences.
  • Finally, the AI model can produce useful targeted summaries, for example, on financial performance or ESG activities. This enables standardized insights.

In summary, the study provides evidence on the economic usefulness of generative AI in analyzing financial disclosures and dealing with information overload. The introduced bloat measure also offers a new way to evaluate reporting quality.

Data Utilization

The paper does not provide specific information about which companies' data were used in the analysis. However, some details can be inferred:

  • The initial sample consists of MD&A sections and conference call transcripts for all US non-financial public firms from 2009-2020.
  • Financial firms (SIC codes starting with 6) were excluded.
  • After filtering, the initial samples were 8,699 MD&As and 40,362 conference calls.
  • Due to computational constraints, approximately 20% random samples were drawn from these populations.
  • The final MD&A sample had 1,790 reports from 339 unique firms.
  • The conference call sample had 8,537 transcripts from 360 unique firms.
  • Summary statistics indicate these are large firms with mean market capitalization around $2.1 billion.
  • The samples seem representative of the overall population of US public firms.

So in summary, while the specific company names are not provided, the data spans thousands of MD&A reports and conference call transcripts from a randomized sample of primarily large US public non-financial firms from 2009-2020.

Quarterly Earnings Call

So in summary, GPT-3.5 is used in an unsupervised way to generate summaries of corporate disclosures, which are then analyzed to assess the usefulness of this AI technique. The prompts provided to GPT-3.5 were kept simple and no further training was done.

The study analyzes earnings conference call transcripts in a similar way to MD&A sections:

  • The authors obtain transcripts of quarterly earnings conference calls from S&P Capital IQ.
  • These calls are voluntary events where companies discuss financial results and answer analyst questions.
  • The full transcript of the earnings call is provided as input to the GPT-3.5 model.
  • This includes the prepared remarks as well as the Q&A session.
  • Operator instructions and analyst questions are also retained as they contain relevant information.
  • GPT-3.5 summarizes the full transcript into a condensed version capturing key information.
  • The length, sentiment, and information content of summaries are compared to the original transcripts.
  • The authors also construct a disclosure bloat measure for conference calls using the reduction in length after summarization.
  • They examine time trends in bloat and its determinants for conference call transcripts.
  • The informativeness of summary sentiment in explaining stock returns is also assessed.

So in summary, the full textual content of earnings call transcripts is summarized by the AI model in the same unsupervised way as done for MD&A sections. The properties of conference call summaries and disclosure bloat measures are analyzed in a similar fashion.

MD&A

MD&As are included as “Item 7” of the 10-K filings. We download all 10-K filings (including 10-K/A and 10-KSB) from EDGAR and use regular expressions to extract Item 7 (Kim and Nikolaev, 2022). The MD&A (Management's Discussion and Analysis) section is an important part of a company's annual report and 10-K filing. Here are some key points about what it is:

  • MD&A is an annual narrative report in which management discusses the company's financial performance.
  • It provides management's perspective on the business results and the factors underlying them.
  • The goal is to help investors see the company's performance through the eyes of management.
  • MD&A complements and provides context for the financial statements and footnotes.
  • It analyzes reasons for material changes in financial statement items across periods.
  • Key elements include discussion of operations, liquidity, capital resources, outlook, trends, risks, etc.
  • SEC regulations mandate inclusion of MD&A as "Item 7" in annual 10-K filings.
  • MD&A aims to provide information relevant for understanding historical results and assessing future prospects.
  • The content is largely unstructured narrative text as opposed to quantitative disclosures.
  • Provides management insight into drivers of performance and strategy in their own words.

In summary, the MD&A section offers management's qualitative assessment and analysis of the company's financial statements and business results for a given fiscal year. It adds color and context to the numbers reported in the financials.

GPT-3.5 Turbo API

Based on the information provided in the summary, the key AI tool used for analysis in this study is GPT-3.5 Turbo. Specifically:

  • The authors use the GPT-3.5 Turbo API provided by Anthropic's ChatCompletion endpoint to generate summaries.
  • GPT-3.5 Turbo is an auto-regressive language model based on the Transformer architecture.
  • It is pre-trained on a vast corpus of text data and then fine-tuned for summarization tasks.
  • The model relies on attention mechanisms to focus on the most relevant information when summarizing text.
  • For each MD&A and earnings call transcript document, the authors provide it as input text to the GPT-3.5 API.
  • They use a simple prompt instructing GPT-3.5 to summarize the input text without restricting length.
  • GPT-3.5 then outputs a condensed summary capturing the key information from the original document.
  • The authors analyze properties of the summaries like length, sentiment, and information content.
  • They also construct a "disclosure bloat" measure based on the relative reduction in document length after summarization.

Alternative AI

While I don't have full context, there are a few potential reasons why GPT-3.5 may have been preferred over Claude for this specific research study:

  • The study was likely conducted before Claude was released in 2022. GPT-3.5 predates Claude by over a year.
  • GPT-3.5 was state-of-the-art in language processing at the time with very large scale training. Of course this is no longer the case.
  • The authors may have had better access to GPT-3.5 through Anthropic's research partnerships.
  • Claude is focused more on harmless, helpful, and honest dialog. GPT-3.5 has broader NLP capabilities.My personal experience is that Claude sticks to the facts, and GPT-3.5 tends to make things up.
  • The study required analyzing large volumes of financial text. GPT-3.5 may have been better suited.
  • The authors may have wanted to benchmark against other NLP research using GPT-3.5.

Claude has advanced natural language capabilities as well and is optimized for this task. It's quite possible that if this study were conducted today, Claude may also be a suitable choice for generative summarization of financial documents.

Statistical Results

Here are some key details about the disclosure bloat index introduced in the paper:

  • Bloat is defined as: (Length of original document - Length of summary) / Length of original document
  • It captures the relative reduction in document length after summarization by GPT-3.5
  • Bloat can theoretically range from 0 (no reduction) to 1 (full reduction).
  • In the MD&A sample, the average Bloat is 0.754.
  • For the conference call sample, average Bloat is 0.685.
  • The standard deviation of Bloat is around 0.08 for MD&As and 0.13 for conference calls.
  • The 25th to 75th percentile range of Bloat is approximately 0.69 to 0.82 for MD&As.
  • For conference calls, the interquartile range is 0.58 to 0.79.
  • So higher values of Bloat indicate larger relative reduction and more potential "informational bloat" in the original document.
  • The interpretation is that higher Bloat suggests more redundant or irrelevant content in the original text.
  • Bloat varies meaningfully across time periods and firms, indicating heterogeneity in disclosure quality.

The study produced several key artifacts and statistics related to the length, sentiment, and information content of the AI-generated summaries:

Length:

  • MD&A summaries were about 25% the length of originals on average (3,779 vs 17,901 words).
  • Conference call summaries were around 30% of the original length (2,300 vs 7,501 words).

Sentiment:

  • MD&A summary sentiment had a higher standard deviation than originals (0.316 vs 0.202).
  • Positive MD&As became more positive and negative ones more negative after summarization.
  • Similar amplification of sentiment was observed for conference calls.

Information Content:

  • MD&A summary sentiment better explained stock returns around filing dates than original sentiment.
  • For conference calls, summary sentiment had 50% higher explanatory power for returns.
  • Summary-based sentiment exhibited larger effects on returns.
  • Adjusted R-squared doubled when using summary vs original sentiment.

In summary, key statistics indicate summaries substantially reduced length while amplifying sentiment and improving information content as judged by stock market reactions.

Authors

Unfortunately the paper summary does not provide details about the authors, their institutions, or previous publications. However, based on the available information, I can make some inferences:

Authors:

  • The authors are from the University of Chicago's Booth School of Business based on the affiliation listed.


 

 

No comments:

Post a Comment

When RAND Made Magic + Jason Matheny Response

Summary The article describes RAND's evolution from 1945-present, focusing on its golden age (1945-196...